You are about to action:

Object being modified by the action

Do you want to proceed?

Share Video Link

https://www.bitchute.com/video/3a0_hAiFKag/

Click to copy, then share by pasting into your messages, comments, social media posts and websites.

Embed Video HTML

Click to copy, then add into your webpages so users can view and engage with this video from your site.

Share to Social Media

Share to social media by clicking on the quick share links.

Report Content

Reason

Please select the most appropriate reason from the list provided.

Note: For a more detailed description of each reason, see our Community Guidelines.

Additional Comments

Please add any additonal comments that will help with the assessment of your request.

Note: Copyright claims must contain all the items specified within the Copyright Policy.

Email Submissions

We also accept reports via email. Please see the Guidelines Enforcement Process for instructions on how to make a request via email.

Thank you for submitting your report

We will investigate and take the appropriate action.

Add to Playlist

TransformerFAM: Feedback attention is working memory

Next video playing soon

Click to cancel

Autoplay has been paused

Click to watch next video

First published at 07:32 UTC on April 30th, 2024.

Yannic Kilcher

subscribers

BitChute Premium. More Badges. More Channels. More Playlists. Free Merch & More.

BitChute Advertisement

Paper: https://arxiv.org/abs/2404.09173

Abstract:
While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs. We propose Feedback Attention Memory (FAM), a novel Transformer architecture that leverages a feedback loop to enable the network to attend to its own latent representations. This design fosters the emergence of working memory within the Transformer, allowing it to process indefinitely long sequences. TransformerFAM requires no additional weights, enabling seamless integration with pre-trained models. Our experiments show that TransformerFAM significantly improves Transformer performance on long-context tasks across various model sizes (1B, 8B, and 24B). These results showcase the potential to empower Large Language Models (LLMs) to process sequences of unlimited length.

Authors: Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, Khe Chai Sim, Pedro Moreno Mengibar

Links:
Homepage: https://ykilcher.com
Merch: https://ykilcher.com/merch
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://ykilcher.com/discord
LinkedIn: https://www.linkedin.com/in/ykilcher

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

LESS

Category	Science & Technology
Sensitivity	Normal - Content that is suitable for ages 16 and over

DISCUSS THIS VIDEO

The owner has disabled comments on this channel.

RANT

RAVE

Playing Next

118 17:46

[ML News] Devin exposed | NeurIPS track for high school students

Yannic Kilcher

3 weeks ago

TransformerFAM: Feedback attention is working memory

Playing Next

Related Videos

Warning - This video exceeds your sensitivity preference!