Big Bird: Transformers for Longer Sequences (Paper Explained)

First published at 07:58 UTC on August 2nd, 2020.
subscribers

#ai #nlp #attention

The quadratic resource requirements of the attention mechanism are the main roadblock in scaling up transformers to long sequences. This paper replaces the full quadratic attention mechanism by a combination of random attention,…

MORE
CategoryScience & Technology
SensitivityNormal - Content that is suitable for ages 16 and over
DISCUSS THIS VIDEO