Click to copy, then share by pasting into your messages, comments, social media posts and websites.
Click to copy, then add into your webpages so users can view and engage with this video from your site.
Report Content
We also accept reports via email. Please see the Guidelines Enforcement Process for instructions on how to make a request via email.
Thank you for submitting your report
We will investigate and take the appropriate action.
SupSup: Supermasks in Superposition (Paper Explained)
Supermasks are binary masks of a randomly initialized neural network that result in the masked network performing well on a particular task. This paper considers the problem of (sequential) Lifelong Learning and trains one Supermask per Task, while keeping the randomly initialized base network constant. By minimizing the output entropy, the system can automatically derive the Task ID of a data point at inference time and distinguish up to 2500 tasks automatically.
OUTLINE:
0:00 - Intro & Overview
1:20 - Catastrophic Forgetting
5:20 - Supermasks
9:35 - Lifelong Learning using Supermasks
11:15 - Inference Time Task Discrimination by Entropy
15:05 - Mask Superpositions
24:20 - Proof-of-Concept, Task Given at Inference
30:15 - Binary Maximum Entropy Search
32:00 - Task Not Given at Inference
37:15 - Task Not Given at Training
41:35 - Ablations
45:05 - Superfluous Neurons
51:10 - Task Selection by Detecting Outliers
57:40 - Encoding Masks in Hopfield Networks
59:40 - Conclusion
Paper: https://arxiv.org/abs/2006.14769
Code: https://github.com/RAIVNLab/supsup
My Video about Lottery Tickets: https://youtu.be/ZVVnvZdUMUk
My Video about Supermasks: https://youtu.be/jhCInVFE2sc
Abstract:
We present the Supermasks in Superposition (SupSup) model, capable of sequentially learning thousands of tasks without catastrophic forgetting. Our approach uses a randomly initialized, fixed base network and for each task finds a subnetwork (supermask) that achieves good performance. If task identity is given at test time, the correct subnetwork can be retrieved with minimal memory usage. If not provided, SupSup can infer the task using gradient-based optimization to find a linear superposition of learned supermasks which minimizes the output entropy. In practice we find that a single gradient step is often sufficient to identify the correct mask, even among 2500 tasks. We also showcase two promising extensions. First, SupSup models can be trained entirely without task identity information, as they may detect when they are uncertain about new data and allocate an additional supermask for the new training distribution. Finally the entire, growing set of supermasks can be stored in a constant-sized reservoir by implicitly storing them as attractors in a fixed-sized Hopfield network.
Authors: Mitchell Wortsman, Vivek Ramanujan, Rosanne Liu, Aniruddha Kembhavi, Mohammad Rastegari, Jason Yosinski, Ali Farhadi
Links:
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
Category | Science & Technology |
Sensitivity | Normal - Content that is suitable for ages 16 and over |
Playing Next
Related Videos
[ML News] Llama 3 changes the game
5 hours ago
1 week ago
Flow Matching for Generative Modeling (Paper Explained)
2 weeks, 2 days ago
Warning - This video exceeds your sensitivity preference!
To dismiss this warning and continue to watch the video please click on the button below.
Note - Autoplay has been disabled for this video.