Watch On:
Summary
Although the effect is rather crude, the system offers an early glimpse of what’s coming next for generative artificial intelligence, and it is the next obvious step from the text-to-image to move to text-to-video AI systems. They’re trickier to train, because there aren’t large-scale data sets of high-quality videos paired with text. To work around this, Meta combined data from three open-source image and video data sets to train its model. Standard text-image data sets of labeled still images helped the AI learn what objects are called and what they look like. However, there’s plenty of room for the research community to improve on, especially if these systems are to be used for video editing and professional content creation,.
Show Notes
Although the effect is rather crude, the system offers an early glimpse of what’s coming next for generative artificial intelligence, and it is the next obvious step from the text-to-image AI systems that have caused huge excitement this year. In the last month alone, AI lab OpenAI has made its latest text-to-image AI system DALL-E available to everyone, and AI startup Stability.AI launched Stable Diffusion, an open-source text-to-image system. They’re also trickier to train, because there aren’t large-scale data sets of high-quality videos paired with text. To work around this, Meta combined data from three open-source image and video data sets to train its model. Standard text-image data sets of labeled still images helped the AI learn what objects are called and what they look like.
Source
https://www.technologyreview.com/2022/09/29/1060472/meta-text-to-video-ai/