Google made waves this week with the launch of its new AI system Gemini, including a slick marketing video that quickly went viral. The demo showed Gemini conversing fluently with a human, identifying objects, playing games, and solving puzzles. However, Google has now admitted that the video was essentially staged and edited to exaggerate Gemini’s capabilities.
The Gemini demo, initially lauded for its real-time interaction with spoken-word prompts and video, was revealed to have been edited for effect. Google confessed that the responses in the video were sped up, and contrary to the impression given, the AI did not respond to voice or video in real time. Instead, the AI interacted with still image frames and text prompts.
In the six-minute video, Gemini appears to fluidly respond to spoken questions and on-screen visuals. For example, when the demonstrator holds up a rubber duck and verbally asks what material it’s made of, Gemini correctly identifies it after being told the duck squeaks when squeezed. Impressive feats like guessing the location of a hidden ball under cups also seem to showcase real-time visual understanding.
However, the reality was far different. Google has now revealed that the video was pieced together using “still image frames” and “prompting via text” rather than continuous voice and video. The conversational flow was added in afterward. So while the outputs came from Gemini, the interaction was simulated. This is a major deviation from the live, dynamic interaction showcased in the video.
For instance, in a segment where Gemini seemingly identifies a rubber duck's buoyancy, it was actually responding to a still image and a text prompt about the duck's squeakiness. Similarly, the cups and balls trick was replicated by showing the AI images representing cups being swapped, not by analyzing live video.
The video also depicted Gemini inventively creating a game using emojis based on a world map. However, Google's blog clarified that the AI was explicitly instructed on the game's structure and responded to predefined examples, rather than spontaneously generating the game.
Google maintains that the demo was intended to illustrate Gemini's potential and inspire developers, as stated by Oriol Vinyals, VP of Research & Deep Learning Lead at Google DeepMind. The company emphasized that the AI's responses were actual outputs from Gemini, albeit derived from prompts and still images.
The exaggerations have sparked some backlash, as viewers felt deceived by the simulated interactions. But credulity aside, Gemini remains an impressive AI achievement. With its advanced natural language processing across text, image, and other data, developers have a powerful platform. However, real-world applications will still require much additional work.
As the AI race heats up, be careful of bold claims and flashy demos that don’t always match reality. Still, Gemini seems highly promising if not quite as seamless as its produced video suggested. Google would be wise to highlight its true merits while avoiding misleading hype.