CS Papers Deep-Read

Milestone Papers, Distilled — BigCat's Shelf

> one paper, one diagram · read this page ≈ understand the paper
Paper 1Attention Is All You Need — one attention-only architecture that became the foundation of every modern large modelVaswani et al. · 2017 Paper 2Deep Residual Learning (ResNet) — one skip connection let networks go hundreds of layers deep; the backbone of almost every deep netHe et al. · 2015 Paper 3AlexNet — a deep net that learned its own features from a million images, won 2012 by a landslide and ignited the deep learning revolutionKrizhevsky et al. · 2012 Paper 4Word2Vec — a machine that teaches itself word vectors from massive text; "king − man + woman ≈ queen"; where the embedding era beganMikolov et al. · 2013 Paper 5ViT (An Image is Worth 16×16 Words) — slice an image into patches as "words" for a standard Transformer; at scale it overtakes CNNs, and vision and language share one architectureDosovitskiy et al. · 2020