First published at 14:20 UTC on March 28th, 2022.
#deeplearning #nlp #sampling
Modern language models like T5 or GPT-3 achieve remarkably low perplexities on both training and validation data, yet when sampling from their output distributions, the generated text often seems dull and uninteresting.…
MORE
#deeplearning #nlp #sampling
Modern language models like T5 or GPT-3 achieve remarkably low perplexities on both training and validation data, yet when sampling from their output distributions, the generated text often seems dull and uninteresting. Various workarounds have been proposed, such as top-k sampling and nucleus sampling, but while these manage to somewhat improve the generated samples, they are hacky and unfounded. This paper introduces typical sampling, a new decoding method that is principled, effective, and can be implemented efficiently. Typical sampling turns away from sampling purely based on likelihood and explicitly finds a trade-off between generating high-probability samples and generating high-information samples. The paper connects typical sampling to psycholinguistic theories on human speech generation, and shows experimentally that typical sampling achieves much more diverse and interesting results than any of the current methods.
Sponsor: Fully Connected by Weights & Biases
https://wandb.ai/fully-connected
OUTLINE:
0:00 - Intro
1:50 - Sponsor: Fully Connected by Weights & Biases
4:10 - Paper Overview
7:40 - What's the problem with sampling?
11:45 - Beam Search: The good and the bad
14:10 - Top-k and Nucleus Sampling
16:20 - Why the most likely things might not be the best
21:30 - The expected information content of the next word
25:00 - How to trade off information and likelihood
31:25 - Connections to information theory and psycholinguistics
36:40 - Introducing Typical Sampling
43:00 - Experimental Evaluation
44:40 - My thoughts on this paper
Paper: https://arxiv.org/abs/2202.00666
Code: https://github.com/cimeister/typical-...
Authors: Clara Meister, Tiago Pimentel, Gian Wiher, Ryan Cotterell
LESS