January 11, 2024January 18, 2024

Google Gemini: Everything you need to know about the new generative AI platform

Google AI

Googleâ€™s trying to make waves with Gemini, a new generative AI platform that recently made its big debut. But while Gemini appears to be promising in a few aspects, itâ€™s falling short in others. So what is Gemini? How can you use it? And how does it stack up to the competition?

To make it easier to keep up with the latest Gemini developments, weâ€™ve put together this handy guide, which weâ€™ll keep updated as new Gemini models and features are released.

What is Gemini?

Gemini is Googleâ€™sÂ long-promised, next-gen generative AI model family, developed by Googleâ€™s AI research labs DeepMind and Google Research. It comes in three flavors:

Gemini Ultra, the flagship Gemini model
Gemini Pro, a â€œliteâ€ Gemini model
Gemini Nano, a smaller â€œdistilledâ€ model that runs on mobile devices like theÂ Pixel 8 Pro

All Gemini models were trained to be â€œnatively multimodalâ€ â€” in other words, able to work with and use more than just text. They were pre-trained and fine-tuned on a variety of audio, images and videos, a large set of codebases and text in different languages.

That sets Gemini apart from models such as Googleâ€™s own large language modelÂ LaMDA, which was only trained on text data. LaMDA canâ€™t understand or generate anything other than text (e.g. essays, email drafts and so on) â€” but that isnâ€™t the case with Gemini models. Their ability to understand images, audio and other modalities is still limited, but itâ€™s better than nothing.

Whatâ€™s the difference between Bard and Gemini?

Google's Bard

Image Credits:Â Google

Google, provingÂ once againÂ that it lacks a knack for branding, didnâ€™t make it clear from the outset that Gemini is separate and distinct from Bard. Bard is simply an interface through which certain Gemini models can be accessed â€” think of it as an app or client for Gemini and other GenAI models. Gemini, on the other hand, is a family of models â€” not an app or front end. Thereâ€™s no standalone Gemini experience, nor will there likely ever be. If you were to compare to OpenAIâ€™s products, Bard corresponds toÂ ChatGPT, OpenAIâ€™s popular conversational AI app, and Gemini corresponds to the language model that powers it, which in ChatGPTâ€™s case is GPT-3.5 or 4.

Incidentally, Gemini is also totally independent fromÂ Imagen-2, a text-to-image model that may or may not fit into the companyâ€™s overall AI strategy. Donâ€™t worry, youâ€™re not the only one confused by this!

What can Gemini do?

Because the Gemini models are multimodal, they can in theory perform a range of tasks, from transcribing speech to captioning images and videos to generating artwork. Few of these capabilities have reached the product stage yet (more on that later), but Googleâ€™s promising all of them â€” and more â€” at some point in the not-too-distant future.

Of course, itâ€™s a bit hard to take the company at its word.

GoogleÂ seriously under-deliveredÂ with the original Bard launch. And more recently it ruffled feathersÂ with a video purporting to show Geminiâ€™s capabilitiesÂ that turned out to have been heavily doctored and was more or less aspirational. GeminiÂ is, to the tech giantâ€™s credit, available in some form today â€” but a rather limited form.

Still, assuming Google is being more or less truthful with its claims, hereâ€™s what the different tiers of Gemini models will be able to do once theyâ€™re released:

Gemini Ultra

Few people have gotten their hands on Gemini Ultra, the â€œfoundationâ€ model on which the others are built, so far â€” just a â€œselect setâ€ of customers across a handful of Google apps and services. That wonâ€™t change until sometime later this year, when Googleâ€™s largest model launches more broadly. Most info about Ultra has come from Google-led product demos, so itâ€™s best taken with a grain of salt.

Google says that Gemini Ultra can be used to help with things like physics homework, solving problems step-by-step on a worksheet and pointing out possible mistakes in already filled-in answers. Gemini Ultra can also be applied to tasks such as identifying scientific papers relevant to a particular problem, Google says â€” extracting information from those papers and â€œupdatingâ€ a chart from one by generating the formulas necessary to recreate the chart with more recent data.

Gemini Ultra technically supports image generation, as alluded to earlier. But that capability wonâ€™t make its way into the productized version of the model at launch, according to Google â€” perhaps because the mechanism is more complex than how apps such asÂ ChatGPTÂ generate images. Rather than feed prompts to an image generator (likeÂ DALL-E 3, in ChatGPTâ€™s case), Gemini outputs images â€œnativelyâ€ without an intermediary step.

Gemini Pro

Unlike Gemini Ultra, Gemini Pro is available publicly today. But confusingly, its capabilities depend on where itâ€™s used.

Google says that in Bard, where Gemini Pro launched first in text-only form, the model is an improvement over LaMDA in its reasoning, planning and understanding capabilities. An independentÂ studyÂ by Carnegie Mellon and BerriAI researchers found that Gemini Pro is indeed better than OpenAIâ€™sÂ GPT-3.5Â at handling longer and more complex reasoning chains.

But the study also found that, like all large language models, Gemini Pro particularly struggles with math problems involving several digits, andÂ users have found plenty of examplesÂ of bad reasoning and mistakes. It made plenty of factual errors for simple queries like who won the latest Oscars. Google has promised improvements, but itâ€™s not clear when theyâ€™ll arrive.

Gemini Pro is also available via API in Vertex AI, Googleâ€™s fully managed AI developer platform, which accepts text as input and generates text as output. An additional endpoint, Gemini Pro Vision, can process textÂ andÂ imagery â€” including photos and video â€” and output text along the lines of OpenAIâ€™sÂ GPT-4 with VisionÂ model.

Gemini

Using Gemini Pro in Vertex AI.Â Image Credits:Â Gemini

Within Vertex AI, developers can customize Gemini Pro to specific contexts and use cases using a fine-tuning or â€œgroundingâ€ process. Gemini Pro can also be connected to external, third-party APIs to perform particular actions.

Sometime in â€œearly 2024,â€ Vertex customers will be able to tap Gemini Pro to power custom-built conversational voice and chat agents (i.e. chatbots). Gemini Pro will also become an option for driving search summarization, recommendation and answer generation features in Vertex AI, drawing on documents across modalities (e.g. PDFs, images) from different sources (e.g. OneDrive, Salesforce) to satisfy queries.

Gemini

Image Credits:Â Gemini

In AI Studio, Googleâ€™s web-based tool for app and platform developers, there are workflows for creating freeform, structured and chat prompts using Gemini Pro. Developers have access to both Gemini Pro and the Gemini Pro Vision endpoints, and they can adjust the model temperature to control the outputâ€™s creative range and provide examples to give tone and style instructions â€” and also tune the safety settings.

Gemini Nano

Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and itâ€™s efficient enough to run directly on (some) phones instead of sending the task to a server somewhere. So far it powers two features on the Pixel 8 Pro: Summarize in Recorder and Smart Reply in Gboard.

The Recorder app, which lets users push a button to record and transcribe audio, includes a Gemini-powered summary of your recorded conversations, interviews, presentations and other snippets. Users get these summaries even if they donâ€™t have a signal or Wi-Fi connection available â€” and in a nod to privacy, no data leaves their phone in the process.

Gemini Nano is also in Gboard, Googleâ€™s keyboard app, as aÂ developer preview. There, it powers a feature called Smart Reply, which helps to suggest the next thing youâ€™ll want to say when having a conversation in a messaging app. The feature initially only works with WhatsApp, but will come to more apps in 2024, Google says.

Is Gemini better than OpenAIâ€™s GPT-4?

Thereâ€™s no way to know how the Gemini familyÂ reallyÂ stacks up until Google releases Ultra later this year, but the company has claimed improvements on the state of the art â€” which is usually OpenAIâ€™s GPT-4.

Google has several times touted Geminiâ€™s superiority on benchmarks, claiming that Gemini Ultra exceeds current state-of-the-art results on â€œ30 of the 32 widely used academic benchmarks used in large language model research and development.â€ The company says that Gemini Pro, meanwhile, is more capable at tasks like summarizing content, brainstorming and writing than GPT-3.5.

But leaving aside the question of whether benchmarks really indicate a better model, the scores Google points to appear to be only marginally better than OpenAIâ€™s corresponding models. And â€” as mentioned earlier â€” some early impressions havenâ€™t been great, withÂ usersÂ andÂ academicsÂ pointing out that Gemini Pro tends to get basic facts wrong, struggles with translations and gives poor coding suggestions.

How much will Gemini cost?

Gemini Pro is free to use in Bard and, for now, AI Studio and Vertex AI.

Once Gemini Pro exits preview in Vertex, however, the model will cost $0.0025 per character while output will cost $0.00005 per character. Vertex customers pay per 1,000 characters (about 140 to 250 words) and, in the case of models like Gemini Pro Vision, per image ($0.0025).

Letâ€™s assume a 500-word article contains 2,000 characters. Summarizing that article with Gemini Pro would cost $5. Meanwhile,Â generatingÂ an article of a similar length would cost $0.1.

Where you can try Gemini?

Gemini Pro

The easiest place to experience Gemini Pro is inÂ Bard. A fine-tuned version of Pro is answering text-based Bard queries in English in the U.S. right now, with additional languages and supported countries set to arrive down the line.

Gemini Pro is alsoÂ accessibleÂ in preview in Vertex AI via an API. The API is free to use â€œwithin limitsâ€ for the time being and supports 38 languages and regions including Europe, as well as features like chat functionality and filtering.

Elsewhere, Gemini Pro can beÂ foundÂ in AI Studio. Using the service, developers can iterate prompts and Gemini-based chatbots and then get API keys to use them in their apps â€” or export the code to a more fully featured IDE.

Duet AI for Developers, Googleâ€™s suite of AI-powered assistance tools for code completion and generation, will start using a Gemini model in the coming weeks. And Google plans to bring Gemini models to dev tools for Chrome and its Firebase mobile dev platform around the same time, in early 2024.

Gemini Nano

Gemini Nano is on the Pixel 8 Pro â€” and will come to other devices in the future. Developers interested in incorporating the model into their Android apps canÂ sign upÂ for a sneak peek.

Weâ€™ll keep this post up to date with the latest developments.

____________________________________________________________

This article was written by TechCrunch and originally published here.

Google Gemini: Everything you need to know about the new generative AI platform

What is Gemini?

Whatâ€™s the difference between Bard and Gemini?

What can Gemini do?

Gemini Ultra

Gemini Pro

Gemini Nano

Is Gemini better than OpenAIâ€™s GPT-4?

How much will Gemini cost?

Where you can try Gemini?

Gemini Pro

Gemini Nano

Author

Alka

You might also be interested in

AI Sales Strategy: 13 Effective Ways Companies are Using AI to Improve Sales

State of AI and the C-Suite 2024: Secrets to a Successful AI Strategy

The Future of Go-to-Market AI: Introducing ZoomInfo Copilot

iCumulus Pty Ltd