Digests » 5

ai

RUDDER: Return Decomposition for Delayed Rewards

We propose a novel reinforcement learning approach for finite Markov decision processes (MDPs) with delayed rewards. In this work, biases of temporal difference (TD) estimates are proved to be corrected only exponentially slowly in the number of delay steps. Furthermore, variances of Monte Carlo (MC) estimates are proved to increase the variance of other estimates, the number of which can exponentially grow in the number of delay steps.

Turning Fortnite into PUBG with Deep Learning

Understanding CycleGAN for Image Style Transfer and exploring its application to graphics mods for games.

Model Tuning and the Bias-Variance Tradeoff

The goal of modeling is to approximate real-life situations by identifying and encoding patterns in data. Models make mistakes if those patterns are overly simple or overly complex.

A visual introduction to machine learning, Part II

The goal of modeling is to approximate real-life situations by identifying and encoding patterns in data. Models make mistakes if those patterns are overly simple or overly complex.

Introducing improved pricing for Preemptible GPUs

Not everyone needs the extra performance that GPUs bring to a compute workload, but those who do, really do. Earlier this year, we announced that you could attach GPUs to Preemptible VMs on Google Compute Engine and Google Kubernetes Engine, lowering the price of using GPUs by 50%. Today, Preemptible GPUs are generally available (GA) and we’ve lowered preemptible prices on our entire GPU portfolio to be 70% cheaper than GPUs attached to on-demand VMs.