• Ben's Bites
  • Posts
  • Meta shows new image and video generation models

Meta shows new image and video generation models

Meta has announced new AI research into generating videos from text prompts and precisely editing images via text instructions. The Emu image models from Meta are expanding with Emu Video and Emu Edit.

What's going on here?

Meta’s new models can generate high-quality videos and edit images.

What does this mean?

The new Emu Video model can generate high-quality 512x512 4-second videos from text prompts alone. The model uses a simplified "factorized" approach that splits video generation into two steps: generating images from text, then video from images+text. This allows efficient training with diffusion models.

Emu Edit lets users precisely edit images by providing text instructions. It focuses on altering only relevant pixels so the rest of the image stays unchanged. The key insight is incorporating vision tasks as instructions enables unprecedented control. Emu Edit achieved state-of-the-art performance on various edit tasks in evaluations.

Why should I care?

Look at every research from Big Tech companies from the perspective of their existing products. Meta launched AI stickers first with the Emu image models. A very chat/conversation-based product. Now imagine the addition of Emu Video features into reels and Emu Edit into reels. If you’re a user of those products/building for these products you can plan how they’ll fit into your workflow.

Reply

or to participate.