RudderStack: An Open Source Segment Alternative

An Open Source Customer Data Platform built for Developers. Offering Segment API compatibility, multiple hosting options, fixed infrastructure based pricing & powerful real time transformations.

Domain-specific language model pretraining for biomedical natural language processing

In this blog post, we present our recent advances in pretraining neural language models for biomedical NLP. We question the prevailing assumption that pretraining on general-domain text is necessary and useful for specialized domains such as biomedicine.

Hopfield Networks is All You Need

This blog post explains the paper Hopfield Networks is All You Need and the corresponding new PyTorch Hopfield layer.

Visual Guide to Random Forests

Random Forests are a widely used Machine Learning technique for both regression and classification. In this video, we show you how decision trees can be ensembled to create powerful predictive models.

Interpretable machine learning models

Straightforward implementations of interpretable ML models + demos of how to use various interpretability techniques. Code is optimized for readability.

Software Engineering Tips and Best Practices for Data Science

If you’re into data science you’re probably familiar with this workflow: you start a project by firing up a jupyter notebook, then begin writing your python code, running complex analyses, or even training a model. As the notebook file grows in size with all the functions, the classes, the plots, and the logs, you find yourself with an enormous blob of monolithic code sitting up in one place in front of you.