Best Selling Products
Google unveils Gemini Omni, an ambitious next-generation AI video creator.
Nội dung
- 1. Gemini Omni appears in the Gemini app.
- 2. Experiment with a video of a professor writing a mathematical proof.
- 3. The video quality is impressive but not perfect.
- 4. The "Will Smith Eating Spaghetti" Test and its Obvious Limitations
- 5. Comparing Gemini Omni with ByteDance Seedance 2
- 6. Gemini Omni and its connection to Veo
- 7. What might Google announce at Google I/O 2026?
Google is reportedly preparing to launch Gemini Omni: an AI model capable of creating realistic videos from text descriptions. This new technology promises a significant upgrade in scene rendering, motion capture, and character expression.
1. Gemini Omni appears in the Gemini app.
The first information about Gemini Omni didn't come from Google, but from the user community on Reddit. According to a Reddit user, when opening the Google Gemini app on iOS and Android, they unexpectedly received a new notification that said, "Create content with Gemini Omni."
This detail quickly attracted attention because Google had never officially announced a model called Omni before. The interface suggests that this is an experimental feature integrated directly into the Gemini ecosystem rather than a standalone tool.
Google's closed testing of Gemini Omni within the Gemini app demonstrates the company's increasingly clear unified AI strategy. Instead of developing separate AI products, Google is moving toward integrating all AI capabilities into a single platform. This makes it easier for users to create text, images, videos, and audio within a single application.
According to the shared images, Gemini Omni is placed alongside other content creation features and has an interface relatively similar to current AI video creation tools. Users simply need to enter a text description and the system will automatically generate the corresponding video.

2. Experiment with a video of a professor writing a mathematical proof.
One of the first tests shared about Gemini Omni required creating a video depicting a professor writing a mathematical proof on a blackboard. The requirement was quite complex, involving hand movements, human expressions, mathematical symbols, and logical reasoning during the presentation.
The user requested the system to create a scene of a professor demonstrating trigonometric identities on a blackboard in a classroom. This is a challenging task for video AI because the system not only needs to create a human image but also accurately reproduce mathematical formulas and the continuous writing action.

The results were quite impressive. Gemini Omni was able to create videos with natural lighting, relatively realistic facial expressions, and a lively classroom setting. In particular, the mathematical formulas displayed on the whiteboard were much more logically sound than those from older AI video models.
Some viewers commented that the video felt closer to real footage than to AI-generated video. This is a notable step forward because for many years, AI video has often struggled with handling handwriting or content with complex logical structures.
Furthermore, Gemini Omni's ability to understand context is also highly praised. The model not only creates a person standing in front of a blackboard but also accurately recreates the demeanor of a lecturer giving a lesson, from the way they hold the chalk and their body movements to their gaze.
3. The video quality is impressive but not perfect.
Despite the highly-rated video by the math professor, Gemini Omni still has many shortcomings. Several minor but easily noticeable errors further demonstrate that current AI video has not yet achieved absolute accuracy.
In the test video, the professor's writing movements were sometimes out of sync with the content appearing on the board. At times, the hand moved, but the strokes appeared in a different direction, or the speed didn't match the actual movement.

Another bug mentioned by the community is the phenomenon of the chalk disappearing at the end of the video. This is a fairly common error in AI video creation, where the system struggles to maintain consistency of the object between consecutive frames.
Such errors demonstrate that the problem of "temporal consistency"—or consistency over time—remains a major challenge for modern AI video models. AI can create beautiful individual frames, but keeping all objects functioning logically throughout the video is still extremely difficult.
However, it's worth noting that despite its flaws, Gemini Omni still demonstrates a much higher overall quality than many previously existing AI video models. Body movements, lighting, and image composition are realistic enough to mistake it for real video if viewed quickly.
4. The "Will Smith Eating Spaghetti" Test and its Obvious Limitations
In the AI community, the "Will Smith eating spaghetti" test has almost become the unofficial standard for judging the quality of generated video. This is because it's a situation with many complex elements: facial expressions, hand movements, the elasticity of the spaghetti, eating techniques, and the mouth muscle responses during chewing.
When Gemini Omni was tested with a similar request, the results weren't quite as expected. The video depicted two men sitting in a fancy seaside restaurant with pasta on the table. However, the familiar flaws of AI video were still quite apparent.

In some scenes, the pasta appears abruptly on the plate without proper serving. The characters' eating movements also lack naturalness, with their chewing actions not matching the amount of food entering their mouths.
This is a common problem in many current AI video models. Systems can create beautiful scenes but often struggle with detailed actions involving soft physics such as food, liquids, or subtle facial movements.
Test results show that Gemini Omni, while powerful, still hasn't fully addressed the fundamental challenges of AI video. Reproducing realistic human behavior at a cinematic level remains a difficult goal to achieve.
5. Comparing Gemini Omni with ByteDance Seedance 2
During testing, Reddit users also compared Gemini Omni to Seedance 2: a video AI model developed by ByteDance.
Initial reviews suggest that Seedance 2 demonstrates more consistent image quality in some scenes. The consistency of characters and objects in the video is better maintained compared to Gemini Omni.
However, Seedance 2 suffers from stuttering and a lack of smooth motion. This shows that each current AI video model has its own strengths.
Gemini Omni seems to prioritize contextual understanding and natural motion creation, while Seedance 2 focuses more on image stability. The competition between tech companies is driving AI video development in various directions.
Compared to Google's previous models, the Gemini Omni appears to have significantly improved cinematic quality. The footage has more depth, better lighting, and the characters' expressions are less lifeless.
6. Gemini Omni and its connection to Veo
Veo is known as Google's ambitious AI video project, capable of creating high-resolution videos from text descriptions. However, Veo primarily focuses on creating cinematic footage rather than deep integration into the Gemini ecosystem.

Gemini Omni could be a step toward unifying Google's AI technologies into a single platform. This would allow users to seamlessly switch between text, images, audio, and video within a single workflow.
If that's the case, Omni is not simply a new video model, but also the foundation for Google's comprehensive AI content creation ecosystem.
In the future, users may be able to ask Gemini to write scripts, create storyboards, generate voices, edit videos, and even perform post-production entirely using AI.
7. What might Google announce at Google I/O 2026?
According to many predictions, Google will officially announce Gemini Omni at Google I/O 2026 along with a host of other new AI features.
In addition to video creation capabilities, Google may introduce further video editing tools using natural language, AI voice synchronization, and real-time virtual character creation.
Some experts also expect Gemini Omni to support the creation of longer videos with greater consistency.
Google can leverage its cloud infrastructure and proprietary TPU AI chips to accelerate video creation while reducing operating costs.
If Omni is indeed integrated into Android, YouTube, and Gemini, this could be one of Google's most significant AI advancements since the launch of its Gemini chatbot.
Despite its limitations, Gemini Omni demonstrates that AI video is getting closer to the ability to create cinematic content using natural language. What was once considered science fiction is gradually becoming a reality after just a few years of development.

In the next few years, AI video is likely to completely change how people produce digital content. Independent creators will be able to create short films, commercials, or educational videos at a much lower cost.
While many questions remain unanswered, one thing is almost certain: AI video will be the biggest technology trend in the next phase of artificial intelligence, and Gemini Omni could become one of the names shaping that future.