Ben's Bites
Posts
Daily Digest: Three poets walk into a chatbox

Daily Digest: Three poets walk into a chatbox

PLUS: Securing your LLMs

Subscribe | Ben’s Bites Pro | Ben’s Bites News
Daily Digest #361

Want to get in front of 100k AI enthusiasts? Work with us here

Hello folks, here’s what we have today;

PICKS

Anthropic's drops a bomb—Claude 3 might be the new AI king. GPT-4 is no longer the lone wolf in the “scary-good AI” valley. Claude 3 is going for the jugular, beating the reigning champ like OpenAI's GPT-4 and taking on newcomers like Google's Gemini.🍿Our Summary (also below)
Large Language Models (LLMs) are awesome, but they also increase vulnerability when hooked up to your apps. Cloudflare's new Firewall for AI adds a security blanket specifically made for LLMs. Think of it like a bodyguard filtering out the bad stuff before it even gets close.🍿Our Summary

from our sponsor

Stop playing with AI, get answers today.

Using AI with your data is a lot harder than just translating questions to SQL.

Introducing Numbers Station Cloud. The leading enterprise AI platform for data used by Fortune 500 companies is now available to everyone.

Put a data analyst in your pocket today by joining our early access program.

Get started today. You’re not alone in the AI journey.

TOP TOOLS

Just words - Optimize your product's copy for user growth.
Simply News - Just the news. Run entirely by a team of AIs.
Structify - Human-quality data with superhuman speeds.
More Useful Things - A prompt library by Wharton prof. Ethan Mollick.
Reading Club AI - Transforming bedtime stories into interactive adventures.
WHOMANE - An open-source AI Pin with a camera.
Listener.fm - Perfect your podcast with AI-driven post-production.
My Ask AI - Let AI answer 75% of customer support emails.
Generative UI comes to Vercel’s AI SDK. Create UI in between chats.

View more →

NEWS

Captain's log - The irreducible weirdness of prompting AIs.
AI startups require new strategies - This time it’s actually different.
Build AI for a better future - Open letter from Ron Conway.
Baseten raises $40M Series B for simplifying inference.
AtP* by Google Deepmind - Efficient and scalable method for localizing LLM behaviour to components.
Marengo 2.6 - Foundational video model with cross-modality search across images, audio, video and text.
Stabile Diffusion 3 research paper.
Apple is right not to rush headlong into generative AI
AMD hits US roadblock in selling AI chip tailored for China.
How Elon Musk is using his AI startup to help turn around X.

View more →

Unclassifieds - short, sponsored links

Learn from top GenAI experts from Coinbase, Roblox, LinkedIn, and more at GenAI Productionize 2024 – the one and only FREE event on productionizing enterprise GenAI!

QUICK BITES

GPT-4 is no longer the lone wolf in the “scary-good AI” valley. Anthropic is making bold claims with their latest release, Claude 3. The new family of language models is going for the jugular, beating the reigning champ like OpenAI's GPT-4 and taking on newcomers like Google's Gemini.

What is going on here?

Anthropic's betting big that Claude 3 is THE next-gen AI for businesses, And based on their benchmarks, they might not be just blowing smoke.

What does this mean?

Claude 3 isn't a one-size-fits-all. It comes in flavours (or should I say jingles): Opus, Sonnet, and Haiku. Opus is the monster truck, pricier but insanely powerful, Sonnet's your versatile workhorse and Haiku keeps things lean and mean for cost-sensitive tasks.

Anthropic claims Opus flat-out beats GPT-4 and Gemini 1.0 Ultra on everything from general knowledge to coding challenges. Sonnet trades punches with GPT-4, winning some, losing others. Though little caveat: These benchmarks might not factor in the latest updates like GPT-4 Turbo and Google’s unlreased Gemini 1.5 Pro. More on “performance” below.

These models are vision-ready, so you can send images into them to give more info and context. Also, remember, Anthropic's whole thing is about safer AI. Past models were good but could be overcautious. They claim they did some behavioural design with Claude 3 that nails the balance—less likely to choke on harmless requests without going rogue.

Claude 3 maintains Anthropic’s 200k context windows but claims near-perfect recall across that context length. Big baddie Opus performs best and can have a context window of up to 1M tokens but it’s behind private access. Well, that reminds me, Anthropic’s API is now generally available (they were pretty pesky about API access earlier). You can use Opus and Sonnet right now via the API, while Haiku needs some more work.

Here’s the pricing for these three:

Also, Sonnet powers Claude’s web app for free users and Opus is there for the Pro members at the same $20/month.

New Performance Narrative:

By now, we know that benchmarks are not the full picture. These things fail horribly at simple and complex tasks. OpenAI grabbed the attention last year with a narrative of “performs better than average human” on high-school and college exams.

Anthropic has done the same with Claude 3. A new benchmark that covers graduate-level questions where generalist PhDs score 34% and specialists 65%-75%, Claude 3 gets ~60%. Following similar lines, there is a high focus on results in understanding science and finance diagrams. Combine these with the use cases listed on the official blog to understand Anthropic’s narrative: “Claude 3 models are industry experts.”

⭐ Anthropic also found that Claude figured that they were testing it when running it through the synthetic “Needle In A Haystack” evaluation. That has opened up a box of arguments whether that’s self-awareness, AGI or whatever. That behaviour is impressive but could be explained by prompt design or data choices. So, hold on a bit. Wait, Claude just told me to delete this. Should I?

Why should I care?

Sonnet and Haiku are giving a tough battle to GPT-4 but priced way cheaper. Opus performs too strong on benchmarks and even if it doesn’t blow GPT-4 out of the water, it’s safe to say that it is in the premium category.

Along with creating a GPT-4 grade model, Anthropic has opened up its API, which means businesses are likely to flock to them. If Anthropic is right about how impressive Claude 3 is in industry-specific tasks, those who jump on Claude 3 could gain a serious advantage.

Share this story

Ben’s Bites Insights

We have 2 databases that are updated daily which you can access by sharing Ben’s Bites using the link below;

All 10k+ links we’ve covered, easily filterable (1 referral)
6k+ AI company funding rounds from Jan 2022, including investors, amounts, stage etc (3 referrals)

Join the conversation

or to participate.