One of the more impressive demos at Google I/O this year was Project Astra—a real-time, multimodal AI assistant that can see the world, understand context, and respond to user queries in a natural, conversational manner. Powered by Google's Gemini 1.5 model and other task-specific models, Astra processes video and speech input continuously, enabling it to understand and remember its surroundings.
In the demo, a Google employee showed off Astra's capabilities using just a smartphone camera. The AI assistant effortlessly identified objects, answered questions about code snippets, and even recognized the King's Cross area of London from looking out the window. But that's not all, the demo also showed Astra being used with smart glasses—hinting at potentially renewed hardware ambitions.
Demis Hassabis, head of Google DeepMind, emphasized that the goal is to create a "universal assistant" that is helpful in everyday life. "Imagine agents that can see and hear what we do, better understand the context we're in and respond quickly in conversation, making the pace and quality of interactions feel much more natural," he said.
In another impressive demo on X, Google shared a video of Project Astra "watching" the keynote alongside an employee. This suggests that users can expect some sort of desktop integration down the road.
Google says it plans to bring Project Astra to the Gemini app and its other products, starting later this year. While there's no specific launch date yet, Google seems committed to making these capabilities available. After all, CEO Sundar Pichai is calling Astra their "vision for the future of AI assistants."
For those attending I/O in person, Google provided a Project Astra demo station to try out some of the new capabilities. Check out CNET’s Lexy Savvides hands-on video below:
P.S. If you are wondering Astra stands for "advanced seeing and talking responsive agent".