Ben's Bites Newsletter
Posts
Daily Digest: AI labels on social media

Daily Digest: AI labels on social media

PLUS: New ways to test AI are needed

July 02, 2024

Subscribe | Ben’s Bites Pro | Ben’s Bites News
Daily Digest #446

Want to get in front of 100k AI enthusiasts? Work with us here

Hello folks, here’s what we have today;

PICKS

New tutorial: Auto post project updates from your PM tool to Slack - How to create a Zapier Central bot that connects your management tool to Slack.
Meta has updated its AI label on Instagram and other platforms. Instead of “Made with AI”, it’s now called “AI info”. This comes after many posts with small AI edits (eg. from Photoshop’s Generative Fill) were marked as made with AI.
Anthropic's throwing cash at third-party AI evaluations. Anthropic’s new initiative is funding new ways to test frontier AI models that are getting too smart for current evals. Safety, advanced capabilities, and eval-building tools are on the wishlist.🍿Our Summary (also below)
How do you publish a book on AI that isn't immediately out of date? - We defined prompt engineering principles that worked on GPT-3, still work on GPT-4, and will work on GPT-5 too (we hope). This book survived 8 rounds of technical review from O'Reilly, including an early LangChain contributor, so it belongs on your shelf. Prompt Engineering for Generative AI – by James Phoenix & Mike Taylor. (Sponsor)

TOP TOOLS

Gen-3 Alpha by Runway AI is now available to everyone. It takes a few mins and $1-$2 to generate a 10-second video clip.
Suno is now on iOS - Imagine a song from anywhere on your phone.
Superlocal - Map search with results personalized to you.
Prompt Easy - Craft fine-tuning datasets for GPT in under 5min.
Cooked Wiki - Recipes summarized and organized, from anywhere on the web.
Canyon - An all-in-one AI tool for job seekers to land their dream job.
Lytix - Monitor, improve and scale your LLM applications.
Booth AI - Build no-code Gen AI apps.

View more →

NEWS

Figma is pausing its “Make Design” feature as a post of it copying Apple’s weather app goes viral.
Claude 3.5 Sonnet is the #1 model on instruction following and coding in Scale AI’s SEAL evaluation.
Character AI discussed research partnerships with Google and Meta in exchange for IP.
YouTube now lets you request removal of AI-generated content that simulates your face or voice.
Robinhood acquires Pluto, an AI investment research platform.
Morgan Freeman calls out ‘unauthorized’ use of AI replicating his voice in a TikTok video.
AIO pullback - Google now shows 2/3 fewer AIOs and more citations.

View more →

QUICK BITES

Anthropic wants to pay people to build better ways to test their AI models. They're basically saying "Hey nerds, our AI keeps acing all the tests we throw at it, so we need some real brain-busters now!”

What's going on here?

Anthropic announced a new initiative to fund third-party evaluations of advanced AI capabilities and risks.

What does this mean?

Frontier AI models are outgrowing the old evaluation methods faster than teenagers outgrow their shoes. Creating and running new evals is expensive, especially as we get more of the larger models (GPT-4 class) like Gemini 1.5 Pro and Claude 3 and 3.5 series.

To solve this, Anthropic is opening up its wallet to the wider AI community, hoping fresh eyes (i.e. new evals from third parties) can judge these models better.

Anthropic is looking at three main categories: AI safety level assessments, advanced capability metrics, and tools for building evals.

For safety, they want tests for stuff like AI hacking skills, the ability to design bioweapons, and how autonomous models can get.
On the capability side, they're after evals for cutting-edge science, multilingual skills, and societal impacts.
They also want infrastructure to make it easier for experts to whip up good evals without needing coding chops.

It is also sharing a wishlist and inviting proposals through an application form.

Why should I care?

Anthropic's trying to stay ahead of the curve because when your creation starts acing tests faster than you can write them, it's time to bring in the reinforcements.

If you're an AI whiz or domain expert, there's cash on the table. It’s not just Anthropic, other big AI labs are also sweating about evals (OpenAI famously gives early access to eval contributors).

Share this story

Ben’s Bites Insights

We have 2 databases that are updated daily which you can access by sharing Ben’s Bites using the link below;

All 10k+ links we’ve covered, easily filterable (1 referral)
6k+ AI company funding rounds from Jan 2022, including investors, amounts, stage etc (3 referrals)

Reply

or to participate.