You are about to action:

Object being modified by the action

Do you want to proceed?

Share Video Link

https://www.bitchute.com/video/H5vpBCLo74U/

Click to copy, then share by pasting into your messages, comments, social media posts and websites.

Embed Video HTML

Click to copy, then add into your webpages so users can view and engage with this video from your site.

Share to Social Media

Share to social media by clicking on the quick share links.

Report Content

Reason

Please select the most appropriate reason from the list provided.

Note: For a more detailed description of each reason, see our Community Guidelines.

Additional Comments

Please add any additonal comments that will help with the assessment of your request.

Note: Copyright claims must contain all the items specified within the Copyright Policy.

Email Submissions

We also accept reports via email. Please see the Guidelines Enforcement Process for instructions on how to make a request via email.

Thank you for submitting your report

We will investigate and take the appropriate action.

Add to Playlist

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Next video playing soon

Click to cancel

Autoplay has been paused

Click to watch next video

First published at 22:59 UTC on July 3rd, 2019.

Yannic Kilcher

subscribers

Your platform for uncensored ideas. BitChute Premium.

BitChute Advertisement

Abstract:
With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, XLNet outperforms BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking.

Authors: Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le

https://arxiv.org/abs/1906.08237

LESS