Digests » 140

sponsor

2.2M developers learned Python last year. Are you one of them?

Share your views about the most important programming languages, tools, and technologies for machine learning and data science in 2021. Take the Developer Economics survey and receive free resources to plan your next career move, plus a chance to win a new smartphone, gaming laptop, licenses, Amazon vouchers, and more. The survey is open until February.

this week's favorite

Predicting Hard Drive Failure with Machine Learning

We’ve all had a hard drive fail on us, and often it’s as sudden as booting your machine and realizing you can’t access a bunch of your files. It’s not a fun experience. It’s especially not fun when you have an entire data center full of drives that are all important to keeping your business running. What if we could predict when one of those drives would fail, and get ahead of it by preemptively replacing the hardware before the data is lost? This is where the history of predictive drive failure begins.

How Facebook uses AI to improve photo descriptions for visually impaired people

When Facebook users scroll through their News Feed, they find all kinds of content — articles, friends’ comments, event invitations, and of course, photos. Most people are able to instantly see what’s in these images, whether it’s their new grandchild, a boat on a river, or a grainy picture of a band onstage. But many users who are blind or visually impaired (BVI) can also experience that imagery, provided it’s tagged properly with alternative text (or “alt text”). A screen reader can describe the contents of these images using a synthetic voice and enable people who are BVI to understand images in their Facebook feed.

3 deep learning mysteries: Ensemble, knowledge- and self-distillation

Under now-standard techniques, such as over-parameterization, batch-normalization, and adding residual links, “modern age” neural network training—at least for image classification tasks and many others—is usually quite stable. Using standard neural network architectures and training algorithms (typically SGD with momentum), the learned models perform consistently well, not only in terms of training accuracy but even in test accuracy, regardless of which random initialization or random data order is used during the training.

Datasets should behave like git repositories

Problems emerging from data are common in research as well as in the industry. Those problems are dealt with as part of our project, but we usually don't bother solving them at their origin. We fix the data locally once, and we go on with our project. This is certainly a valid method in some cases, but as we share data more and more between projects, we are finding ourselves repeating the same processes over time and across teams. This issue is particularly true for public datasets shared by many people to train many machine learning models. I will show you how to create, maintain, and contribute to a long-living dataset that will update itself automatically across projects, using git and DVC as versioning systems, and DAGsHub as a host for the datasets.

NLPRule: A library for fast grammatical error correction

NLPRule is a library for rule-based grammatical error correction written in pure Rust with bindings for Python. Rules are sourced from LanguageTool.