Google Gemini

Google Announces Gemini 1.5 With Mixture-of-Experts Architecture and 1 Million Token Context Length

February 15, 2024 • 2 min read

Google has announced the release of Gemini 1.5, the latest iteration of its conversational AI model for developers. The upgrade delivers dramatic improvements in efficiency and performance through a new mixture-of-experts (MoE) architecture.

The MoE architecture (here's the paper if you are interested) allows Gemini 1.5 to perform complex tasks more rapidly and maintain quality with less computational demand. Think of it essentially as a constellation of "expert" neural networks, which, depending on the input, selectively activates the most relevant pathways, significantly boosting efficiency. This enables much more sophisticated reasoning and problem-solving abilities than previous models.

However, perhaps the most striking features of Gemini 1.5 is its unparalleled long-context understanding capability. The model can process up to 1 million tokens, a milestone that sets a new standard for large-scale foundation models. This is only a taste of things to come—Google says it has tested up to 10 million tokens in their research 😱.

To put it in perspective, the leap to a million-token context window for Gemini 1.5 is a 10x increase over most state-of-the-art models, and a 5x increase over Anthropic's Claude (200K).

This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, 700,000 words, or codebases with over 30,000 lines of code.

“This breakthrough capability in long-context understanding will open up new possibilities for people, developers and enterprises to create, discover and build using AI” - Demis Hassabis, CEO of Google DeepMind.

For developers and enterprise customers, this opens up a realm of possibilities. The ability to process such extensive context windows means more nuanced and sophisticated AI applications can be developed, spanning various domains from content analysis to complex problem-solving in coding.

In benchmarks, Gemini 1.5 Pro outscored its predecessor Gemini 1.0 Pro in 87% of evaluations spanning text, code, images, audio and video. It also matched the performance of the larger 1.0 Ultra model despite using less computing power.

As Gemini 1.5 rolls out, Google emphasizes its continued commitment to safety and ethical AI development. The model has undergone extensive ethics and safety testing, ensuring its alignment with Google's AI Principles. This level of scrutiny is crucial, given the model's novel capabilities and the potential for wide-reaching impact.

Google is initially offering Gemini 1.5 Pro in a limited preview to developers via its AI Studio platform and through Vertex AI. This allows early testers to experiment with the model and provide feedback ahead of a wider release.

Developers can sign up on AI Studio now to try out the 1.5 Pro model with a standard 128,000 token context length. Google plans to add pricing tiers soon that will scale up to 1 million tokens.

During the preview period, testers can access the experimental million-token context window at no cost. However, Google notes users should expect longer latency times for now. Significant speed optimizations are in development to improve response times.