Digests » 40

this week's favorite

Clustering Pollock

How did Pollock’s colors usage evolved through time? To answer I decided to do some experiments with clustering, applying a few algorithms and plotting some charts.

Yet Another Scalable Apache Airflow With Docker Example Setup

There are plenty of articles describing what Apache Airflow is and when would you want to use it. As it turns out, the problem it solves is really common not only among data science environments.

Analyzing Recurrent Neural Networks (RNNs) Using Polymer Dynamics Theory

As an amateur scientist, I analyze the dynamics of Long Short-Term Memory (LSTM) elements when applied to strings of characters. I show how the terminal padding characters have a relatively small impact on the dynamics of an LSTM element.

A Guide to Actually Understanding the Political Impact of AI

Since their entrance into mainstream political consciousness, Artificial Intelligence (AI) and Big Data have been seen a harbinger of either political doom or revolution.

How A/B Tests Could Go Wrong

In this paper, we share how we mined through historical A/B tests and identified the most common causes for invalid tests, ranging from biased design, self-selection bias to attempting to generalize A/B test result beyond the experiment population and time frame. Furthermore, we also developed scalable algorithms to automatically detect invalid A/B tests and diagnose the root cause of invalidity. Surfacing up invalidity not only improved decision quality, but also served as a user education and reduced problematic experiment designs in the long run.