Ben's Bites Newsletter
Posts
Daily Digest: Top dog of LLMs

Daily Digest: Top dog of LLMs

PLUS: OpenAI's news deals and Mistral's coding beast

May 30, 2024

Subscribe | Ben’s Bites Pro | Ben’s Bites News
Daily Digest #423

Want to get in front of 100k AI enthusiasts? Work with us here

Hello folks, here’s what we have today;

PICKS

New tutorial: Build a GPT that will turn articles into newsletters that look like they’ve been written by you.
- ICYMI: Ben’s Bites Pro will go from $150 to $250 (no subscription) in June (next month), so if you’ve been thinking about it and want to lock in the lower price, now’s the time to do it! Sign up here.
Scale ranks LLMs with new SEAL Leaderboards to bring some much-needed transparency to LLMs. They’re ranking the top models on maths, coding, ability to follow instruction, and languages. Currently, it’s a tight race between GPT-4 series, Gemini 1.5 and Claude models. 🍿Our Summary (also below)
OpenAI’s news deals are on a roll. Just yesterday, they added Vox Media and The Atlantic to their pool of news partners. OpenAI is also partnering with WAN-IFRA to start an accelerator that’ll help newsrooms fast-track their AI adoption.
Mistral AI has released a new coding LLM. Codestral is a 22B parameter model and on benchmarks, it’s beating CodeLlama 70B and Llama 3 70B. It also has a context window of 32k tokens. With its small size, larger context window, and insane performance, I see Codestral becoming the go-to choice for local coding tools (if all goes well with Mistral’s new license).

from our sponsor

Is AI About to Disrupt Hospitality?

Jurny is revolutionizing the $4.1T hospitality industry with AI, and this is your last chance to invest in this round alongside top VCs & 1,200 individuals.

Partnered with Airbnb, Vrbo & Expedia, Jurny's tech automates operations for thousands of property managers globally.

5x customer growth, $35M+ bookings last year
Featured on CNBC, Forbes & Bloomberg

Learn More

TOP TOOLS

udio-130 - New model from Udio capable of two-minute generations with long-term coherence and structure.
Bash - Solves the blank page problem for product teams.
Syllaby V2.0 - Your in-house AI video marketing agency.
MarsCode - GPT4-powered cloud IDE & extensions.
TimeOS - Your productivity system, on autopilot.
Anecdote - Transform your customer feedback into action.
ChatGPT Free users can now access most paid features—including web browsing, vision, data analysis, file uploads, and GPTs (no image generation though).

View more →

NEWS

How A.I. Made Mark Zuckerberg Popular Again in Silicon Valley.
Sam Altman cements his control with an Apple deal.
OpenAI signs 100K PwC workers to ChatGPT’s enterprise tier as PwC becomes its first resale partner.
Training is not the same as chatting - LLMs don’t remember everything you say.
Apple's plan to protect privacy with AI - Putting cloud data in a black box.
Hi, AI - a16z’s thesis on AI voice agents.
Perplexity AI wants to raise more VC money at a $3B valuation.
MavenAGI raises $20M Series A to solve customer support with AI agents.
Jay Kreps, co-founder and CEO of Confluent, has joined Anthropic's Board of Directors.
How to build an AI agent for SEO research and content generation.

View more →

QUICK BITES

With so many large language models (LLMs) out there now, it can be hard to know which ones are actually the best. Scale AI just launched their SEAL Leaderboards to rank LLMs using unbiased data and expert evaluation.

What's going on here?

Scale AI just launched the SEAL Leaderboards, the first truly expert-driven and trustworthy ranking system for LLMs.

What does this mean?

Scale AI created the SEAL (Safety, Evaluations, and Alignment Lab) to address common problems in LLM evaluation, like biased data and inconsistent reporting.

It’s a bit like Michelin star ratings, but for AI. The leaderboard ranks LLMs based on their performance in areas like coding, math, and ability to follow instructions. They've even brought in verified experts to assess the models.

What really sets SEAL apart is its focus on quality and fairness. They use private datasets that can't be manipulated, expert evaluators, and transparent methodologies to give us the most accurate picture yet of how different LLMs stack up. Currently, it’s a tight race between GPT-4 series, Gemini 1.5 and Claude models. Check the leaderboards here.

Why should I care?

The SEAL Leaderboards give us a clearer picture of how these models actually perform.

They also address a major hurdle in AI development: the race to the bottom caused by companies manipulating benchmarks to make their LLMs appear better. This often leads to contamination and overfitting, where models learn to perform well on specific tests but struggle in real-world applications.

SEAL's private datasets and rigorous evaluation methods aim to prevent these issues, ensuring the Leaderboards provide a trustworthy picture of LLM capabilities.

Share this story

Ben’s Bites Insights

We have 2 databases that are updated daily which you can access by sharing Ben’s Bites using the link below;

All 10k+ links we’ve covered, easily filterable (1 referral)
6k+ AI company funding rounds from Jan 2022, including investors, amounts, stage etc (3 referrals)

Reply

or to participate.