Quick Recap for Google I/O 2024

GOOGLE I/O 2024. Huff!! There’s too much to cover but let’s start with the key themes:

  • Google is integrating AI into all of its ecosystem. In true Google fashion, many features are “coming later this year”. If they ship and perform like the demos, Google will get a serious upper hand over OpenAI/Microsoft.

  • All of the AI features across Google products will be powered by Gemini 1.5 Pro. It’s Google’s best model and one of the top models. A new Gemini 1.5 Flash model is also launched, which is faster and much cheaper.

  • Google has ambitious projects in the pipeline. Those include a real-time voice assistant called Astra, a long-form video generator called Veo, plans for end-to-end agents, virtual AI teammates and more.

Now, let’s dive in—beginning with integration to products:

  • In Search, the SGE (Search Generative Experience) is coming out of beta as “AI overviews” to everyone, starting with the US. AI overviews will also get multi-step reasoning and the ability to search based on a video. Read more about Search updates here.

  • "Ask Photos" is a new feature that will upgrade searching through photos from simple keywords like “flowers” to “show me all the images of my son playing with our dog” and “what was the first place we visited on our Japan trip”.

  • The same search, reasoning and Q&A across all your emails will also come to Gmail. Gmail is also testing a few features via Labs: summaries for each email and suggested drafts as replies. The AI side panel in Docs, Sheets and other workspace apps is coming to everyone, with more features. Details on AI in Gmail and Workspace.

  • On Android, Gemini Nano will power on-device features like Circle to Search, Multimodal Talkback, and Scam detection.

  • Gemini Advanced, Google’s paid chatbot is getting some much-needed features like file uploads and data analysis (like code interpreter). These will be live soon powered by Gemini 1.5 Pro. On the longer horizon, Gemini Live will compete with ChatGPT’s voice assistant that OpenAI revealed yesterday and Gems will be Google’s version of GPTs.

Which brings us to the models:

  • Gemini 1.5 Pro which Google revealed in February is coming to everyone now: via API, AI studio, Gemini Advanced and all the product updates. Google says it’s also improved on a number of metrics but we don’t have a technical report to know the specifics yet.

  • A new model, Gemini 1.5 Flash, is also available in API and AI studio. It’s faster and cheaper than Gemini 1.5 Pro. Its performance on some select benchmarks is provided, putting it in the Llama 3 70B and Claude Sonnet category but with a price similar to Claude Haiku.

A key thing to remember about Gemini models is that all the models in this family are multimodal natively. They can take any form: text, images, audio, or video as input and create any form of output. Ask Photos and Audio outputs in Notebook LM are good examples of the power of multimodality.

Also, the 1.5 series had a context window of 1M tokens (2M is in preview now). That means you can add a bunch of (very) long files in any media form and these models should work.

OpenAI’s GPT-4o is their first such model. Previously they used to stitch different models to make multimodality work.

Why not jump to Google’s war against OpenAI now? Google showcased a bunch of ambitious projects from Deepmind that take on OpenAI’s impressive sci-fi reveals.

  • Project Astra is a general AI assistant. It can talk to you in real-time, understanding the audio and video around you. Google isn’t fixated on making it sound like ScarJo for the Her hype but the functionality is almost similar. Jerry compiled a bunch of friends trying out Astra at I/O. Project Astra would likely come to us with a feature called Gemini Live.

  • Veo is Google’s Sora competitor. It can create long-form, 1080p videos from text prompts with the same claim as Sora of “simulating world physics”. The samples don’t look as awesome as Sora though, but there is a waitlist for you to try getting your hands on it.

  • Google is also coming to the agent's land. There are variations of how Google is planning this: Gems (like GPTs), virtual teammates in workspaces, and end-to-end task completion via the Gemini app.

Some other announcements that are worth mentioning:

  • MusicLM: Music AI sandbox with impressive demos and song integration on YouTube. MusicFx gets a new DJ mode.

  • Imagen 3: Google's most advanced image generation model. Again, heavily censored after its blunder with images in February.

  • SynthID is coming to text and video. It’ll be open-sourced later this year.

  • TPU v6 with codename Trillium is announced. Infra is a part of Google’s moat in the AI race.

  • Gemini Nano will be built into the Chrome Desktop client.

