CS Papers Deep-Read
Milestone Papers, Distilled — BigCat's Shelf
> one paper, one diagram · read this page ≈ understand the paper
Paper 1
Attention Is All You Need — one attention-only architecture that became the foundation of every modern large model
Vaswani et al. · 2017
Paper 2
Deep Residual Learning (ResNet) — one skip connection let networks go hundreds of layers deep; the backbone of almost every deep net
He et al. · 2015
Paper 3
AlexNet — a deep net that learned its own features from a million images, won 2012 by a landslide and ignited the deep learning revolution
Krizhevsky et al. · 2012
Paper 4
Word2Vec — a machine that teaches itself word vectors from massive text; "king − man + woman ≈ queen"; where the embedding era began
Mikolov et al. · 2013
Paper 5
ViT (An Image is Worth 16×16 Words) — slice an image into patches as "words" for a standard Transformer; at scale it overtakes CNNs, and vision and language share one architecture
Dosovitskiy et al. · 2020