Click to copy, then share by pasting into your messages, comments, social media posts and websites.
Click to copy, then add into your webpages so users can view and engage with this video from your site.
Report Content
We also accept reports via email. Please see the Guidelines Enforcement Process for instructions on how to make a request via email.
Thank you for submitting your report
We will investigate and take the appropriate action.
Author Interview: SayCan - Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
#saycan #robots #ai
This is an interview with the authors Brian Ichter, Karol Hausman, and Fei Xia.
Original Paper Review Video: https://youtu.be/Ru23eWAQ6_E
Large Language Models are excellent at generating plausible plans in response to real-world problems, but without interacting with the environment, they have no abilities to estimate which of these plans are feasible or appropriate. SayCan combines the semantic capabilities of language models with a bank of low-level skills, which are available to the agent as individual policies to execute. SayCan automatically finds the best policy to execute by considering a trade-off between the policy's ability to progress towards the goal, given by the language model, and the policy's probability of executing successfully, given by the respective value function. The result is a system that can generate and execute long-horizon action sequences in the real world to fulfil complex tasks.
OUTLINE:
0:00 - Introduction & Setup
3:40 - Acquiring atomic low-level skills
7:45 - How does the language model come in?
11:45 - Why are you scoring instead of generating?
15:20 - How do you deal with ambiguity in language?
20:00 - The whole system is modular
22:15 - Going over the full algorithm
23:20 - What if an action fails?
24:30 - Debunking a marketing video :)
27:25 - Experimental Results
32:50 - The insane scale of data collection
40:15 - How do you go about large-scale projects?
43:20 - Where did things go wrong?
45:15 - Where do we go from here?
52:00 - What is the largest unsolved problem in this?
53:35 - Thoughts on the Tesla Bot
55:00 - Final thoughts
Paper: https://arxiv.org/abs/2204.01691
Website: https://say-can.github.io/
Abstract:
Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embodiment. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide real-world grounding by means of pretrained skills, which are used to constrain the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model's "hands and eyes," while the language model supplies high-level semantic knowledge about the task.
Category | Science & Technology |
Sensitivity | Normal - Content that is suitable for ages 16 and over |
Playing Next
Related Videos
[ML News] Llama 3 changes the game
19 hours ago
1 week ago
Flow Matching for Generative Modeling (Paper Explained)
2 weeks, 3 days ago
Warning - This video exceeds your sensitivity preference!
To dismiss this warning and continue to watch the video please click on the button below.
Note - Autoplay has been disabled for this video.