Google Just Showed Off a Powerful New Upgrade to Gemini

If it feels like Google is bombarding us with Gemini announcements, that's because they are. Last week, the company rebranded its AI bot, Bard, to Gemini, and introduced Gemini Advanced, its first paid AI subscription tier. Fresh on the heels of that announcement comes Gemini 1.5, the next iteration of Google's AI model.

What is Gemini 1.5?

Google says Gemini 1.5 is built on a Transformer and Mixture-of-Experts (MoE) architecture. Compared to a traditional Transformer architecture, which is essentially one large neural network, Google says that MoE models can be chopped up into multiple "expert" neural networks. That way, only the pathways that are necessary for the given task activate in the model, which Google says is much more efficient than the standard function.

The first version of Gemini 1.5 being made available to testers is Gemini 1.5 Pro. "Pro" is Google's name for its LLMs (large language models) that work best across the widest possible tasks and devices. Think of it like GPT-3.5, the model designed to be used in the most applications possible, rather than the most powerful ones.

While 1.5 Pro has a standard 128,000 token context window (the same as GPT-4), Google says that it's currently testing a context window of up to one million tokens, as well as 10 million, versus 1.0 Pro's 32,000. The more tokens a model can run, the larger your prompt can be, and thus, the larger a request the model can handle at once. 1.5 Pro can handle up to one hour of video, 11 hours of audio, and codebases with over 30,000 lines of code (or over 700,000 words) at once, according to Google. (It can also handle single prompts with more than 100,000 lines of code.)

1.5 Pro is reportedly better at understanding large amounts of information and being able to answer complex and specific questions about it. In a demo video, Google shows an example of feeding 1.5 Pro the 402-page transcripts from the Apollo 11 mission. It then shared a sketch of a boot walking, with an arrow pointing to the ground and asked the model what this image was about. The model was able to identify that this sketch depicted the moment Neil Armstrong stepped on the moon, and said his famous quote, all from the drawing. It can also analyze content like a silent film, describe what happened, and highlight small moments most viewers might miss.

Google says 1.5 Pro outperforms 1.0 Pro on 87% of the testing benchmarks it uses. It also performs "at a broadly similar level" to 1.0 Ultra, the model powering Gemini Advanced. The company is also happy with its "in-context learning," in which the model can take in information from its current data set and apply it in new ways, without needing additional prompts.

How to try Gemini 1.5

Google is running trials for Gemini 1.5 Pro through AI Studio and Vertex AI, and has a waitlist for interested developers. If you are indeed an interested developer, you can sign up for the waitlist here.

Once Gemini 1.5 is widely available through Google's Gemini services, you'll be able to use its 125,000 token window for free. If you want access to the one million token window, that'll cost you. (At this time, the price is unknown.)