Digests » 135
this week's favorite
The world of data is now where the world of code was 50 years ago. We manage large data sets on object storages (e.g. S3, Azure Blob Storage, GCS), essentially a huge shared folder, and hope for the best. Although this environment has proven to be cost effective and scalable, managing data pipelines over it is highly error prone.
Nvidia introduces a new method to train AI models using limited data sets. Using minimal study material required for a general GAN, it can now learn complex skills, be it recreating images of cancer tissue or emulating famous painters.
It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model.
paperai is an AI-powered literature discovery and review engine for medical/scientific papers. paperai helps automate tedious literature reviews allowing researchers to focus on their core work. Queries are run to filter papers with specified criteria. Reports powered by extractive question-answering are run to identify answers to key questions within sets of medical/scientific papers.
Random forest is one of the highest used models in classical machine learning. Because of its robustness in high noisy data, and much better ability to learn irregular patterns of data makes the random forest a worthy candidate for modelling in many fields like genomics and others.