Click to copy, then share by pasting into your messages, comments, social media posts and websites.
Click to copy, then add into your webpages so users can view and engage with this video from your site.
Report Content
We also accept reports via email. Please see the Guidelines Enforcement Process for instructions on how to make a request via email.
Thank you for submitting your report
We will investigate and take the appropriate action.
OpenAI DALL·E: Creating Images from Text (Blog Post Explained)
#openai #science #gpt3
OpenAI's newest model, DALL·E, shows absolutely amazing abilities in generating high-quality images from arbitrary text descriptions. Like GPT-3, the range of applications and the diversity of outputs is astonishing, given that this is a single model, trained on a purely autoregressive task. This model is a significant step towards the combination of text and images in future AI applications.
OUTLINE:
0:00 - Introduction
2:45 - Overview
4:20 - Dataset
5:35 - Comparison to GPT-3
7:00 - Model Architecture
13:20 - VQ-VAE
21:00 - Combining VQ-VAE with GPT-3
27:30 - Pre-Training with Relaxation
32:15 - Experimental Results
33:00 - My Hypothesis about DALL·E's inner workings
36:15 - Sparse Attention Patterns
38:00 - DALL·E can't count
39:35 - DALL·E can't global order
40:10 - DALL·E renders different views
41:10 - DALL·E is very good at texture
41:40 - DALL·E can complete a bust
43:30 - DALL·E can do some reflections, but not others
44:15 - DALL·E can do cross-sections of some objects
45:50 - DALL·E is amazing at style
46:30 - DALL·E can generate logos
47:40 - DALL·E can generate bedrooms
48:35 - DALL·E can combine unusual concepts
49:25 - DALL·E can generate illustrations
50:15 - DALL·E sometimes understands complicated prompts
50:55 - DALL·E can pass part of an IQ test
51:40 - DALL·E probably does not have geographical / temporal knowledge
53:10 - Reranking dramatically improves quality
53:50 - Conclusions & Comments
Blog: https://openai.com/blog/dall-e/
Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
Category | Science & Technology |
Sensitivity | Normal - Content that is suitable for ages 16 and over |
Playing Next
Extracting Training Data from Large Language Models (Paper Explained)
3 years, 4 months ago
Related Videos
[ML News] Chips, Robots, and Models
1 week, 1 day ago
TransformerFAM: Feedback attention is working memory
1 week, 3 days ago
[ML News] Devin exposed | NeurIPS track for high school students
1 week, 4 days ago
[ML News] Llama 3 changes the game
2 weeks, 1 day ago
Warning - This video exceeds your sensitivity preference!
To dismiss this warning and continue to watch the video please click on the button below.
Note - Autoplay has been disabled for this video.