Digests » 150
this week's favorite
I needed to finetune the GPT2 1.5 Billion parameter model for a project, but the model didn't fit on my gpu. So i figured out how to run it with deepspeed and gradient checkpointing, which reduces the required GPU memory. Now it can fit on just one GPU.
Sixteen years ago, many of us held a printout of directions in one hand and the steering wheel in the other to get around— without information about the traffic along your route or details about when your favorite restaurant was open. Since then, we’ve been pushing the boundaries of what a map can do, propelled by the latest machine learning.
Open-domain long-on answering (LFQA) form questions a fundamental challenge in natural language processing (NLP) that involves retrieving documents relevant to a given query and using them to generate a detailed paragraph-length answer.
We identify label errors in the test sets of 10 of the most commonly-used computer vision, natural language, and audio datasets, and subsequently study the potential for these label errors to affect benchmark results.
We apply our result to derive some geometric properties of coupled reflected Brownian motion, especially those properties which have been used in the recent work on the “hot spots” conjecture for special domains.