or subscribe with
Join 3,500+ readers for one email each week.
Digests » 99
this week's favorite
Commodity machine learning optimizers, which are available in standard packages, such as TensorFlow and PyTorch, work by linearly approximating loss functions incurred by the training samples using gradients. In this latest installment of the blog series on constructing optimizers which avoid approximation in order to exploit as much information as possible, we discuss Regularized losses, and show how to construct optimizers which preserve important aspects of the regularizer that are lost when using approximations. We also use our derivation to create a Python implementation of an optimizers for L1 and L2 regularization as an example, and demonstrate their advantages over standard optimizers, such as Adagrad.
Describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if the probability that a incoming spam mail is related to the total presence of the word “Free”, using Bayes’ theorem the word “Free” can be used to more accurately assess the probability of a mail being spam than can be done without knowledge of the words within the mail.
An interactive deep learning book with code, math, and discussions, based on the NumPy interface.
It happened again. Last week, as I was explaining my job to someone, they interrupted me and said "So you're building Skynet". I felt like I had to show them this meme, which I thought described pretty well my current situation.
I had high hopes about the potential impact of being a Data Scientist. I felt every company should be a “data company”. My expectations did not meet reality.