Digests » 16

ai

From Pandas to Scikit-Learn — A new exciting workflow

Scikit-Learn will make one of its biggest upgrades in recent years with its mammoth version 0.20 release. For many data scientists, a typical workflow consists of using Pandas to do exploratory data analysis before moving to scikit-learn for machine learning. This new release will make the process simpler, more feature-rich, robust, and standardized.

Illustrated Guide to Recurrent Neural Networks

Recurrent Neural Networks are an extremely powerful machine learning technique but they may be a little hard to grasp at first. For those just getting into machine learning and deep learning, this is guide in plain english with helpful visuals to help you grok RNN's.

The Worst Kind of Data: Missing Data

Most publicly available datasets or datasets at the workplace are complete. However, from time to time we encounter datasets where some or many entries are missing. The problem of missing data exists on a spectrum; only a few entries missing among millions is virtually negligible, however, upwards of 10% of missing data can be crippling. The exact problem of missing data contains multiple layers, so let us proceed to peel it like the onion it is.  At its most basic, enough missing data may skew the distribution(s) the data follows.

A brief introduction to statistics

In the modern world of computers and information technology, the importance of statistics is very well recognized by all the disciplines. Statistics has originated as a science of statehood and found applications slowly and steadily in Agriculture, Economics, Commerce, Biology, Medicine, Industry, planning, education and so on. As on date there is no other human walk of life, where statistics cannot be applied.

Reinforcement Learning: a comprehensive introduction

A series of three articles on the Markov Decision Processes, a piece of the mathematical framework underlying Reinforcement Learning techniques. A couple more are in the process of being written, but I believe that the material could already be useful to anyone interested in taking a look at the "nitty gritty" math formulation.