Autoregressive Diffusion Models (Machine Learning Research Paper Explained)
#machinelearning #ardm #generativemodels
Diffusion models have made large advances in recent months as a new type of generative models. This paper introduces Autoregressive Diffusion Models (ARDMs), which are a mix between autoregressive generative models and diffusion models. ARDMs are trained to be agnostic to the order of autoregressive decoding and give the user a dynamic tradeoff between speed and performance at decoding time. This paper applies ARDMs to both text and image data, and as an extension, the models can also be used to perform lossless compression.
0:00 - Intro & Overview
3:15 - Decoding Order in Autoregressive Models
6:15 - Autoregressive Diffusion Models
8:35 - Dependent and Independent Sampling
14:25 - Application to Character-Level Language Models
18:15 - How Sampling & Training Works
26:05 - Extension 1: Parallel Sampling
29:20 - Extension 2: Depth Upscaling
33:10 - Conclusion & Comments
We introduce Autoregressive Diffusion Models (ARDMs), a model class encompassing and generalizing order-agnostic autoregressive models (Uria et al., 2014) and absorbing discrete diffusion (Austin et al., 2021), which we show are special cases of ARDMs under mild assumptions. ARDMs are simple to implement and easy to train. Unlike standard ARMs, they do not require causal masking of model representations, and can be trained using an efficient objective similar to modern probabilistic diffusion models that scales favourably to highly-dimensional data. At test time, ARDMs support parallel generation which can be adapted to fit any given generation budget. We find that ARDMs require significantly fewer steps than discrete diffusion models to attain the same performance. Finally, we apply ARDMs to lossless compression, and show that they are uniquely suited to this task. Contrary to existing approaches based on bits-back coding, ARDMs obtain compelling results not only on complete datasets, but also on compressing single data points. Moreover, this can be done using a modest number of network calls for (de)compression due to the model's adaptable parallel generation.
Authors: Emiel Hoogeboom, Alexey A. Gritsenko, Jasmijn Bastings, Ben Poole, Rianne van den Berg, Tim Salimans
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
|Category||Science & Technology|
|Sensitivity||Normal - Content that is suitable for ages 16 and over|
Warning - This video exceeds your sensitivity preference!
To dismiss this warning and continue to watch the video please click on the button below.
Note - Autoplay has been disabled for this video.
This advertisement has been selected by the BitChute platform.
By purchasing and/or using the linked product you are helping to cover the costs of running BitChute. Without the support of the community this platform will cease to exist.
Registered users can opt-out of receiving advertising via the Interface tab on their Settings page.
To help support BitChute or find out more about our creator monetization policy: