Reading List

Natural Language Processing

Language Modeling

(2018) Universal Language Model Fine-tuning for Text Classification
[PDF] [arXiv]

(2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
[PDF] [arXiv]

(2019) ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
[PDF] [arXiv]

(2019) RoBERTa: A Robustly Optimized BERT Pretraining Approach
[PDF] [arXiv]

(2019) Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
[PDF] [arXiv]

(2019) XLNet: Generalized Autoregressive Pretraining for Language Understanding
[PDF] [arXiv]

(2019) GPT-2: Language Models are Unsupervised Multitask Learners
[PDF] [SemanticScholar]

(2020) GPT-3: Language Models are Few-Shot Learners
[PDF] [arXiv]

Machine Translation

(2002) BLEU: a Method for Automatic Evaluation of Machine Translation
[PDF] [SemanticScholar]

(2014) Sequence to Sequence Learning with Neural Networks
[PDF] [arXiv]

(2017) Attention is all you need
[PDF] [arXiv]

(2018) Understanding Back-Translation at Scale
[PDF] [arXiv]

(2019) Bridging the Gap between Training and Inference for Neural Machine Translation
[PDF] [arXiv]

Summarization

(2015) A Neural Attention Model for Abstractive Sentence Summarization
[PDF] [arXiv]

Question Answering

(2015) SQuAD: 100,000+ Questions for Machine Comprehension of Text
[PDF] [arXiv]

(2015) Bidirectional Attention Flow for Machine Comprehension
[PDF] [arXiv]

Word Embeddings

(2013) Distributed Representations of Words and Phrases and their Compositionality
[PDF] [arXiv]
Note: word2vec

(2014) Glove: Global Vectors for Word Representation
[PDF] [SemanticScholar]

(2016) Enriching Word Vectors with Subword Information
[PDF] [arXiv]
Note: FastText

(2018) Universal Sentence Encoder
[PDF] [arXiv]

Misc

(1999) A Statistical MT Tutorial Workbook
[PDF]
Note: Good EM introductory text

(2009) Bayesian Inference with Tears
[PDF]
Note: Funny