Digests » 27
We study the problem of training deep neural networks with Rectified Linear Unit (ReLU) activiation function using gradient descent and stochastic gradient descent. In particular, we study the binary classification problem and show that for a broad family of loss functions, with proper random weight initialization, both gradient descent and stochastic gradient descent can find the global minima of the training loss for an over-parameterized deep ReLU network, under mild assumption on the training data.
This is a not-particularly-systematic attempt to curate a handful of my favorite resources for learning statistics and machine learning. This isn’t meant to be comprehensive, and in fact is still missing the vast majority of my favorite explainers. Rather, it’s just a smattering of resources I’ve found myself turning to multiple times and thus would like to have in one place.
The technique is named after the great theoretical physicist Richard Feynman. He was nicknamed the ‘The Great Explainer’ for his remarkable skill of explaining even the most complex scientific topics in plain layman language.
Did you ever look at the L² regularization term of a neural network’s cost function and wondered why it is scaled by both 2 and m?
A Google Brain team led by “Godfather of Deep Learning” Geoffrey Hinton has proposed a new way to accurately detect black box and white box FGSM and BIM attacks.