Thursday, Google revealed Gemini 1.5 Pro, which the company describes as offering “significantly improved performance” over the previous model. The company's AI trajectory – viewed internally as increasingly critical for its future – follows the Gemini 1.0 Ultra unveiling last week, alongside the rebranding of the Bard chatbot (to Gemini) to align with the new model's more powerful and versatile capabilities.
In an announcement blog post, Google CEO Sundar Pichai and Google DeepMind CEO Demis Hassabis attempt to assure their audience of the ethical safety of AI while touting the rapid scaling capabilities of their models. “Our teams continue to push the limits of our latest models with safety at the heart,” summarized Pichai.
The company must emphasize security for AI skeptics (one of whom former CEO of Google) And government regulators. But it also needs to highlight the accelerated performance of its models to AI developers, potential customers and investors who worry the company has been too slow to respond. The resounding success of OpenAI with ChatGPT.
Pichai and Hassabis claim that Gemini 1.5 Pro offers comparable results to Gemini 1.0 Ultra. However, Gemini 1.5 operates at this level more efficiently, with reduced computational requirements. Multimodal capabilities include processing text, images, video, audio, or code. As AI models advance, they will continue to offer a more versatile array of functionality in a single dialog box (another recent example is that of OpenAI integrating DALL-E 3 image generation into ChatGPT).
Gemini 1.5 Pro can also handle up to a million tokens, or the units of data that AI models can process in a single query. Google claims Gemini 1.5 Pro can handle over 700,000 words, one hour of video, 11 hours of audio, and codebases with over 30,000 lines of code. The company claims to have even “successfully tested” a version supporting up to 10 million tokens.
The company claims that Gemini 1.5 Pro maintains high accuracy in queries with a larger number of tokens when it has more new data to learn. The model is said to have impressed in the Needle in a haystack assessment. In this test, developers insert a small piece of information into a long block of text to see if the AI model can detect it. Google said Gemini 1.5 Pro can find embedded text 99% of the time in data blocks of up to a million tokens.
Google claims that Gemini 1.5 Pro can reason about various details of the 402-page transcripts of the Apollo 11 lunar mission. Additionally, it can analyze plot points and events from a 44-minute silent film posted online with Buster Keaton. “As 1.5 Pro's long pop-up is the first of its kind among large-scale models, we are continually developing new benchmarks and benchmarks to test its new capabilities,” Hassabis wrote.
Google launches Gemini 1.5 Pro with capacities of 128,000 tokens, the same number at which OpenAI's GPT-4 models (publicly announced) peak. Hassabis says Google will eventually introduce new pricing tiers supporting up to one million token requests.
Gemini 1.5 Pro is also capable of learning new skills from information contained in long prompts, without additional adjustment (“learning in context”). In a benchmark called Automatic translation from a single book, the model learned a grammar textbook for Kalamang, a language with fewer than 200 speakers worldwide and on which it had not previously been trained. The company claims that Gemini 1.5 Pro has learned to perform at a similar level to a human learning the same content when translating from English to Kalamang.
In a part of the announcement that will catch developers' attention, Google says Gemini 1.5 Pro can perform problem-solving tasks on longer blocks of code. “When given a prompt containing more than 100,000 lines of code, it can better reason through examples, suggest useful changes, and give explanations of how different parts of the code work,” Hassabis wrote.
In terms of ethics and security, Google claims to adopt “the same approach to responsible deployment” as with the Gemini 1.0 models. This includes the development and application of red-teaming techniques, in which a group of ethical developers essentially serve as devil's advocate, testing for “a range of potential harms.” Additionally, the company says it carefully reviews areas such as content safety and representational breaches. The company says it continues to develop new ethics and security tests for its AI tools.
Google releases Gemini 1.5 in early access for developers and enterprise customers. The company plans to make it more widely available eventually. Gemini 1.0 is currently available to consumers, alongside a Pro variant it costs $20 per month.