- Ben's Bites
- Posts
- Daily Digest: Behind the scenes of building GPT-4o
Daily Digest: Behind the scenes of building GPT-4o
PLUS: A million words under 10 cents.
Subscribe | Ben’s Bites Pro | Ben’s Bites News
Daily Digest #474
Want to get in front of 100k AI enthusiasts? Work with us here
Hello folks, next Tuesday, Chris is covering how to build an AI sales assistant with Google Sheets and Slack in our workshop.
Here’s what we have today;
PICKS
Sam Altman and OpenAI have been teasing their reasoning breakthrough “Strawberry” - and it might be just around the corner. But for now, we have to be happy with:
A detailed system card for GPT-4o, their multimodal AI model behind “Her”-like voice chatbot.🍿Our Summary (also below)
Free ChatGPT users can now generate 2 images per day. Too little but hey it’s more than zero.
Zico Kolter from Carnegie Mellon joins OpenAI’s Board as the technical AI safety expert. He’ll also join the Safety & Security Committee.
Google doubles down on Gemini 1.5 Flash and AI studio. Gemini 1.5 Flash prices are now slashed by up to 78%. AI Studio, their app for testing these AI models now supports 100+ languages and gets easier access through Google Workspace. The best one though, is PDF understanding with text + vision across AI Studio and Gemini API.🍿Our Summary (also below)
The fastest way to build AI apps
Writer Framework: build Python apps with drag-and-drop UI
API and SDKs to integrate into your codebase
Intuitive no-code tools for business users
(sponsor)
BB CONTENT - new content from us
New tutorials:
Set up AI phone agents for customer calls. Implement Bland AI to automate calls and improve prospect conversion rates.
Automate feature request analysis with Zapier. Have AI automatically review product feature requests and notify your team.
Blog posts:
5 powerful techniques to get you started with AI chatbots.
The future of intelligent automation with Gumloop.
How to build AI agents to automate your work.
TOP TOOLS
James from Nvidia - A digital human that knows all about NVIDIA and its products.
Inkeep - AI search & support powered by your content.
Salesify - Your personal AI sales coach.
Silvia - Multilingual dictation for iOS.
Dodo - AI receptionists for veterinary front desks.
Midship - Extract documents straight to your spreadsheets.
NEWS
Microsoft and Palantir team up to sell AI to US defence and intelligence agencies.
Perplexity’s popularity surges as AI search start-up takes on Google.
Anthropic is expanding its model safety bug bounty program with $15k in prizes.
JPMorgan Chase is giving its employees an AI assistant powered by ChatGPT maker OpenAI.
Elon Musk’s X agrees to pause EU data processing for training Grok.
UK launches formal probe into Amazon’s ties with Anthropic.
QUICK BITES
OpenAI's just spilled the tea on GPT-4o, their latest AI whiz kid that can handle text, audio, images, and video like a champ. But don't worry, they've got safety on lock.
What's going on here?
OpenAI released a detailed system card for GPT-4o, their multimodal AI model that's been in the works.
What does this mean?
GPT-4o is an "omni model" that can take in and spit out various types of content - text, audio, images, you name it. It's showing off some cool tricks, like improved performance on medical knowledge tests and better handling of underrepresented languages.
They've put this bad boy through the wringer with safety tests, including external red teaming and their fancy "Preparedness Framework". Key risks they tackled: unauthorized voice generation, speaker identification, and generating sketchy content.
They've built in some serious safeguards, like only allowing pre-approved voices and refusing to identify speakers from audio. The model scored "medium" on their risk scale for persuasiveness, but "low" on other biggies like cybersecurity and biothreats.
Why should I care?
This isn't just another AI model - it's a peek into how big players like OpenAI are trying to balance pushing the tech envelope with keeping things safe. The system card shows they're thinking hard about potential misuse and societal impacts. Plus, the improved capabilities in areas like healthcare and language could have some major real-world applications. It's a sign that AI is getting more powerful, but also that there's a growing focus on responsible development.
QUICK BITES
Google's making its Gemini AI cheaper and more powerful. Time to build some cool stuff!
What's going on here?
Google announced a bunch of updates to their Gemini AI platform, including big price cuts and new features.
What does this mean?
Gemini 1.5 Flash (their popular AI model) just got way cheaper. We're talking 78% off for input costs and 71% off for output. That's huge for developers building high-volume AI apps.
Gemini now speaks 100+ languages. No more getting stuck because your AI doesn't understand Thai or Swahili. Google has improved Gemini’s documentation and added cool features like PDF understanding (it can "see" graphs in your PDFs now).
If you use Google Workspace, you can now access Google AI Studio without any extra setup. Millions more people can easily play with AI tools. It’s also got a bunch of quality-of-life upgrades like better keyboard shortcuts and faster loading times.
Why should I care?
If you're building AI apps, this is awesome news. Cheaper AI means you can do more cool stuff without breaking the bank. The language expansion opens up global markets. And all these little improvements add up to make building with AI easier and more powerful. Whether you're a big company or just tinkering, Google's really making it simpler to work with their AI (which traditionally hasn’t been the case).
Ben’s Bites Insights
We have 2 databases that are updated daily which you can access by sharing Ben’s Bites using the link below;
All 10k+ links we’ve covered, easily filterable (1 referral)
6k+ AI company funding rounds from Jan 2022, including investors, amounts, stage etc (3 referrals)
Reply