Ben's Bites Newsletter
Posts
Daily Digest: What's possible? Everything.

Daily Digest: What's possible? Everything.

PLUS: Ai models are larger, better, and not done yet.

February 16, 2024

Subscribe | Ben’s Bites Pro | Ben’s Bites News
Daily Digest #349

Want to get in front of 100k AI enthusiasts? Work with us here

Hello folks, here’s what we have today;

PICKS

ICYMI I wrote about a company with a very ‘out-there’ initiative on getting its employees to use AI, “if you don’t engage you will be let go.”
Google announced a new model. Didn’t they release one last week? That was Gemini Ultra 1.0. Google’s moving ahead to 1.5 model announcements and now we got a peek into Gemini Pro 1.5. This one has a context window of up to 10M tokens—GPT-4 Turbo has 128k.🍿Our Summary (also below)
OpenAI's mogging everyone in the AI space again. Sora, a new AI model from OpenAI spits out videos based on simple text prompts and we are talking minute-long videos that feel insanely real. Marques Brownlee (MKBHD) posted a YouTube video talking about it. So, what can you dream now, because Sora (sky in Japanese) is the limit.🍿Our Summary (also below)
Scribe just bagged $25M in Series B funding. They want to give you a well-deserved break from answering people’s questions all day. Instead, just create step-by-step guides (automatically, thanks to AI) to share with your team. (I’m an investor)

also noticed some chatter on Twitter about a new model named Mistral-Next. Crazzzy…

TOP TOOLS

Glif - Remix any image on the web.
Magika by Google - AI-powered fast and efficient file type identification.
Squad - The product strategy tool that delivers for you and your users.
Persona by Diarupt - Human-like AI teammates.
Lindy - Create and monetize custom AI agents in minutes.
HeyDay - Turn your information into insights
Intent by Upflowy - Turn your leads' behaviour into AI Summaries.

View more →

NEWS

More launches

Google quietly launches Goose - An internal AI model to help employees write code faster.
Apple is working on a copilot for Xcode, it's software programming tool.
V-Jepa by Meta - A new AI model that learns about the world by watching videos.

More money

Magic Dev raises $100M for creating an AI software engineer.
Langchain has raised $20M series A and Langsmith is generally available now. Sequoia’s POV on the investment and Forbes coverage.
Guardrails AI has raised $7.5M in seed funding for open-source AI reliability.
GPU cloud provider Lambda Labs confirms $320M in new funding.

and regular news

ROBOTS.TXT - The text file that ran the internet, does it now?
The era of abstraction & new creative tensions.
Sam Altman owns OpenAI's venture capital fund.
Instacart is using generative AI to create photography for its recipes.

View more →

QUICK BITES

Google announced a new model. Didn’t they release one last week? That was Gemini Ultra 1.0. Google’s moving ahead to 1.5 model announcements and now we got a peek into Gemini Pro 1.5. This one has a context window of up to 10M tokens—GPT-4 Turbo has 128k.

What is going on here?

Google introduced Gemini Pro 1.5 - A new model with insane context window and performance.

What does this mean?

Google’s already on the treadmill after just giving people access to their best model last week and announced Gemini Pro 1.5 now. So what’s new here?

It is based on the Mixture of Experts architecture. (which many people believe is the secret sauce behind GPT-4).
Just like other Gemini models, it is multimodal from the ground up—understands images, video, and audio natively.
This new model can have up to 10M tokens in its context window. The big hype feature.
Despite being a mid-sized model, it performs at a similar level to Gemini Ultra 1.0—The silent killer. Ultra 1.0 is Google’s biggest model and GPT-4 class model.

Let’s understand these a bit:

Context window means how long your prompt to an AI model can be and if you’re working with long-form content like business PDFs, books etc. you want all you can get. Claude by Anthropic shocked everyone by accepting 100k tokens in its context window last summer (200k some months later) and GPT-4 Turbo announced a 128k token context window in November.

Gemini Pro 1.5 takes two big jabs at other models here:

Huge jump from what they call “standard” i.e. 128k tokens. The 10M context window is a research claim but Google is allowing select developers to test up to 1M tokens.
Multimodal inputs: You can not only put large books in there, but you can query entire movies, languages, codebases, and whatnot.

And these jabs land because in evaluation they have 100% recall till 500k-ish token and >99% till 10M. Examples here.

Twitter’s going gaga over this part but some amazing stuff is hiding underneath all this talk about context window. It’s killer performance on multiple benchmarks.

Google is reporting the performance [page 20] with respect their their Gemini Ultra 1.0 model, not GPT-4 but cross-referencing technical reports, I found Pro 1.5 to be very slightly better than GPT-4’s on MATH, BIG-Bench-Hard and nearby on a bunch of others.

The trick seems to be a Mixture of Experts architecture. The leaks (although most researchers believe this) say that GPT-4 is also a “mixture of experts” model. And Mistral is also using it in their models to punch above their weight. I’m excited to see what Google does with Gemini Ultra and MoE.

Why should I care?

I’ll just remind you of the three examples from Google itself:

Finding contextual quotes from 402-page transcripts from Apollo 11’s mission to the moon.
Loading up a 44-minute silent movie and finding exact scenes.
Programming with a reference of 100,000+ lines of code.

Long context means even if the AI doesn’t know stuff, you can just put the reference material and get your work done. If that’s ain’t exciting, I don’t know what is.

Wait. I know something. How about almost real-looking AI videos?

Share this story

QUICK BITES

OpenAI's mogging everyone in the AI space again. Sora, a new AI model from OpenAI spits out videos based on simple text prompts and we are talking minute-long videos that feel insanely real. So what can you dream now, because Sora (sky in Japanese) is the limit.

What is going on here?

Open AI’s new text-to-video model, Sora, is a major leap for video-generating AI.

What does this mean?

Text-to-video models have been improving gradually since early last year. We started with Will Smith choking on Spaghetti and started getting scripted history, dragon worlds, and decentish-looking videos. But two problems remain:

The videos are still janky. You can tell it’s AI.
They aren’t much long 4 secs, 10 secs, 20 if you push it.

OpenAI broke the chain of gradual improvement and came in an entire level upgrade (or even many levels maybe) with Sora.

Sora’s videos are smooth, dynamic, consistent and go up to 1 minute. You can get detailed: the style of animation, mood, camera angles, etc. Imagine specifying “Wes Anderson directs a Pixar short about hamsters." Sora aims to deliver.

Sora is not out for use yet. Everyone copy-pasting the demo examples OpenAI released and I can’t blame them, they are unreal 😉.

But let’s dig a bit deeper into Sora’s technical report and see what OpenAI claims:

Sora can create videos in a large range of aspect ratios and resolutions. From widescreen 1920x1080p videos, vertical 1080x1920 videos and everything in between.
Similar to DALL·E 3, OpenAI uses language models (GPT) to turn basic prompts into power prompts getting high-quality videos.
Sora can use images and videos as inputs, not just text. That means:
- It can animate images.
- It can extend videos: backwards and forwards.
- It can edit videos like changing the scene with keeping characters the same.
- It can connect two videos, filling the in-between frames automatically.

The wildest claim OpenAI has (and we can see the hints) is that Sora understands the world through videos. It understands 3D motion, behaviour of objects and complex interactions (not perfect though). This all leads to creating a model that simulates our world to the best extent we can.

But how is OpenAI’s model this good, across all this stuff? In their own words: they are purely phenomena of scale. The best explanation of that is this demo with base compute, 4x and 16x compute.

Why should I care?

AI just got serious about filmmaking and it’s gonna kill the video industry.

Just kidding, we don’t cry wolf around here. But in all honesty, get ready for a wave of AI-generated video hitting the web. Sora (and upcoming models) will not only make long, visually consistent videos, but they’ll also handle complex prompts with characters, emotions, and multiple scene changes.

Seeing is... not always believing. At least not anymore. It'll get harder to tell what's real footage and what's been cooked up by an AI. Limitations exist. Physics can be wacky in these videos, and Sora might misinterpret some directions. Don't throw out your special effects team just yet, but take note.

Share this story

Ben’s Bites Insights

We have 2 databases that are updated daily which you can access by sharing Ben’s Bites using the link below;

All 10k+ links we’ve covered, easily filterable (1 referral)
6k+ AI company funding rounds from Jan 2022, including investors, amounts, stage etc (3 referrals)

Reply

or to participate.