Ben's Bites Newsletter
Posts
Anthropic's drops a bomb—Claude 3 might be the new AI king.

Anthropic's drops a bomb—Claude 3 might be the new AI king.

March 05, 2024

GPT-4 is no longer the lone wolf in the “scary-good AI” valley. Anthropic is making bold claims with their latest release, Claude 3. The new family of language models is going for the jugular, beating the reigning champ like OpenAI's GPT-4 and taking on newcomers like Google's Gemini.

What’s going on here?

Anthropic's betting big that Claude 3 is THE next-gen AI for businesses, And based on their benchmarks, they might not be just blowing smoke.

What does that mean?

Claude 3 isn't a one-size-fits-all. It comes in flavours (or should I say jingles): Opus, Sonnet, and Haiku. Opus is the monster truck, pricier but insanely powerful, Sonnet's your versatile workhorse and Haiku keeps things lean and mean for cost-sensitive tasks.

Anthropic claims Opus flat-out beats GPT-4 and Gemini 1.0 Ultra on everything from general knowledge to coding challenges. Sonnet trades punches with GPT-4, winning some, losing others. Though little caveat: These benchmarks might not factor in the latest updates like GPT-4 Turbo and Google’s unlreased Gemini 1.5 Pro. I’ve more on “performance” below.

These models are vision-ready, so you can send images into them to give more info and context. Also, remember, Anthropic's whole thing is about safer AI. Past models were good but could be overcautious. They claim they did some behavioural design with Claude 3 that nails the balance—less likely to choke on harmless requests without going rogue.

Claude 3 maintains Anthropic’s 200k context windows but claims near-perfect recall across that context length. Big baddie Opus performs best and can have a context window of up to 1M tokens but it’s behind private access. Well, that reminds me, Anthropic’s API is now generally available (they were pretty pesky about API access earlier). You can use Opus and Sonnet right now via the API, while Haiku needs some more work.

Here’s the pricing for these three:

Also, Sonnet powers Claude’s web app for free users and Opus is there for the Pro members at the same $20/month.

New Performance Narrative:

By now, we know that benchmarks are not the full picture. These things fail horribly at simple and complex tasks. OpenAI grabbed the attention last year with a narrative of “performs better than average human” on high-school and college exams.

Anthropic has done the same with Claude 3. A new benchmark that covers graduate-level questions where generalist PhDs score 34% and specialists 65%-75%, Claude 3 gets ~60%. Following similar lines, there is a high focus on results in understanding science and finance diagrams. Combine these with the use cases listed on the official blog to understand Anthropic’s narrative: “Claude 3 models are industry experts.”

⭐ Anthropic also found that Claude figured that they were testing it when running it through the synthetic “Needle In A Haystack” evaluation. That has opened up a box of arguments whether that’s self-awareness, AGI or whatever. That behaviour is impressive but could be explained by prompt design or data choices. So, hold on a bit. Wait, Claude just told me to delete this. Should I?

Why should I care?

Sonnet and Haiku are giving a tough battle to GPT-4 but priced way cheaper. Opus performs too strong on benchmarks and even if it doesn’t blow GPT-4 out of the water, it’s safe to say that it is in the premium category.

Along with creating a GPT-4 grade model, Anthropic has opened up its API, which means businesses are likely to flock to them. If Anthropic is right about how impressive Claude 3 is in industry-specific tasks, those who jump on Claude 3 could gain a serious advantage.

Reply

or to participate.