Meta announces Generative AI Models called Emu
Meta research is introducing Emu Video, which can generate short videos given a text prompt, and Emu Edit, which can edit images given text-based instructions, as their latest generative AI research milestone.
Meta has announced its new generative AI tools called Emu. Emu Video allows people to generate new videos only with text prompts, while Emu Edit introduces a different approach to image editing known as inpainting. Meta says that Emu can generate "highly visually appealing" images, with human judges preferring its output to Stable Diffusion XL over 70% of the time.
Emu Video uses a factorized or two-step approach for video generation: first generating an image based on the text prompt, then generating a video from the prompt and generated image. Both steps use a single fine-tuned Emu diffusion model, unlike previous methods such as Make-a-Video which use a pipeline of distinct models. The videos created by Emu Video are limited to 512x512 pixel resolution but show a remarkable coherence with the provided text prompts.
Emu Edit is also based on the Emu diffusion model but includes a task-embedding layer, which converts the text instruction prompt into an additional conditioning vector.
The main advantage we can see here is that we might be able to generate our meta ads faster, using their Emu Tools.
Let's see what key points might each of the tools have.
Emu Video
1. Simplified Video Generation Process:
Emu Video adopts a two-step process for video creation, first generating an image from a text prompt and then creating a video from that image and text. This streamlined approach is less complex than multi-model systems, making it more user-friendly and efficient.
2. High Coherence with Text Prompts
One of the standout features of Emu Video is its ability to maintain a high level of coherence with the provided text prompts. This indicates that the tool is exceptionally adept at accurately translating textual narratives into visual formats, a capability that sets it apart from many existing models and commercial solutions.
3. Quality of Output:
Despite the resolution limitation of 512x512 pixels, the videos created by Emu Video are noted for their smoothness and minimal discrepancies between frames. This suggests a high quality of output, which is crucial for a wide range of applications from entertainment to marketing.
Emu Edit
1. Precision and Flexibility in Image Editing:
Emu Edit enables users to execute highly precise edits on images. Its ability to interpret natural language instructions allows for a more intuitive and flexible approach to image editing, making it accessible to both professionals and amateurs.
2. State-of-the-Art Technology:
According to Meta’s research, Emu Edit sets new standards in instruction-based image editing. This implies that the tool has surpassed existing benchmarks in the field, offering superior performance in understanding and executing complex editing instructions.
3. Alignment with Meta’s Strategic Vision:
The development of Emu Edit is part of Meta's broader strategy to create foundational technologies for the Metaverse. This alignment indicates that Emu Edit is not just an isolated tool, but part of a larger ecosystem of technologies that Meta is developing, which could lead to more integrated and advanced applications in the future.
The competition in the AI generative field is quite intense, as some applications are already capable of what Meta is planning to implement. However, since Meta has its own range of products, Emu will be seamlessly integrated into its existing product ecosystem. Therefore, even if their offerings are not the best in the field, they are likely to be widely used, being part of Meta's social tools that are already utilized by billions of people.