Ben's Bites Newsletter
Posts
Daily Digest: Open Source Multimodal

Daily Digest: Open Source Multimodal

PLUS: Amazon's new AI robots and AI transparency

October 19, 2023

Hello folks, here’s what we have today;

PICKS

Wake up, babe! Adept is open-sourcing Fuyu-8B. Fuyu (hip name btw) is multimodal, i.e. it can see pictures AND read text. The model is available on HuggingFace and is designed for digital agents to understand images and text. 🍿Our Summary (also below)
Amazon is rolling out a major overhaul of its fulfilment centres using new AI and robotics to speed up deliveries. Amazon says that the new technology is designed to work alongside human employees to reduce injuries. Nice, but what about lunch and pee time mate?🍿Our Summary (also below)
Folks from Stanford, MIT and Princeton got together and created a new index called the Foundation Model Transparency Index (FMTI) to measure companies' transparency levels across 100 indicators. Turns out Top AI models offer little transparency, even open-source ones. 🍿Our Summary (also below)

from our sponsor

AI copilot for interviews that let's you focus on your candidates - AI-notes for Google Meet, MS Teams & Zoom.

Dive deep with Aspect HQ’s custom AI summaries, ChatGPT for hiring, AI interviewer coach, and auto-drafted ATS scorecards.

TOP TOOLS

AI quick start by BrowserBear - Get started with AI web scraping in seconds, no coding required.
Bearly Code Interpreter w/ Langchain - Integrate a code sandbox for your LLMs in your chains.
YouRetriever - The easiest way to get access to the You’s Search API that can be integrated with LLM chains.
Martin - Your personal voice AI.
PlayHT 2.0 Turbo - A new blazing-fast conversational AI text-to-speech model with <300ms latency.

View more →

WHO’S HIRING IN AI

Glean - AI-powered workplace search. Read our exclusive profile with the founder.
Perplexity - Revolutionary AI search.
Replit - Build software collaboratively with the power of AI.
OyiLabs - Marketplace for generative voice.
Runway - AI creative tools.
Hugging Face - The AI community building the future.
Anthropic - AI research and products with safety first.
Assembly AI - Build AI apps with voice data.
Cohere - Access advanced LLMs through an API.
Character AI - Super-intelligent chatbots.

Get listed here →

NEWS

PwC partners with OpenAI and Harvey to train and deploy foundation models for tax, legal and HR services.
We're all stochastic parrots - What AI can teach us about being human.
Foxconn and Nvidia are building ‘AI factories’ - Supercomputing data centres to accelerate the development of self-driving cars, autonomous machines and industrial robots.
Pytorch Foundation open sources ExecuTorch - An end-to-end solution for enabling on-device inference across mobile and edge devices including wearables, phones and more.
EU plans stricter rules for the most powerful generative AI models with a three-tiered approach.
Towards a real-time decoding of images from brain activity. The artificial neurons in the algorithm activate similarly to the physical neurons of the brain in response to the same image.
American Federation of Teachers partners with AI identification platform, GPTZero. (which is bad because AI detectors don’t work)
An easy to follow thread on Chinese regulation for AI companies about how to red-team their models for illegal or "unhealthy" information.
How QLoRA works to let you fine-tune models that have billions of parameters on a relatively small GPU.
Mustafa Suleyman and Eric Schmidt - We need an AI equivalent of the IPCC.

View more →

Unclassifieds - short, sponsored links

Centenarian - performance & longevity coaching driven by your wearable data.

QUICK BITES

Wake up, babe! Adept is open-sourcing Fuyu-8B. Fuyu (hip name btw) is multimodal, i.e. it can see pictures AND read text. The model is available on HuggingFace and is designed for digital agents to understand images and text.

What is going on here?

The AI squad at Adept just dropped an open-source multimodal model called Fuyu-8B.

What does this mean?

Unlike other multimodal cuties, Fuyu-8B keeps it simple. She feeds images right into her transformer decoder so she can work with any image size. No separate image encoder or complex training. Fuyu-8B's chill with charts, diagrams, and docs—she answers questions about them like a boss.

On common benchmarks, Fuyu-8B outperforms models with more parameters, showing its efficient architecture. However, these benchmarks have issues, so Adept be like: no worries, we’ll build our own.

Fuyu-8B is a small version of the larger multimodal model that powers their products. Her big sis Fuyu-Medium does next-level stuff like OCR scanned docs and pinpointing UI elements. Adept is keeping their bigger models under wraps for now—fair.

Why should I care?

An open multimodal model is a big step for AI. Simpler architecture = more accessible and scalable. Fuyu-8B is a solid base for researchers and devs to build real-world apps.

Understanding visual data matters for business. Precision OCR/localization unlocks assistants that can see screens like humans and take action. The Fuyu models are geared toward knowledge workers, worth checking the detailed examples on the blog if you’re one.

Share this story

QUICK BITES

Amazon is rolling out a major overhaul of its fulfilment centres using new AI and robotics to speed up deliveries. Amazon says that the new technology is designed to work alongside human employees to reduce injuries. Nice, but what about lunch and pee time mate? 👉👈

What is going on here?

Amazon's new warehouse system will significantly cut delivery times and make inventory tracking much faster.

What does this mean?

Amazon's revamp introduces robots and AI into its warehouses to reduce the time it takes to process orders. The centrepiece is a robotic arm called Sparrow and a new sortation system named Sequoia. Together, these will slash the time to fulfil orders by up to 25% while identifying inventory 75% faster. Amazon’s plans for Sequoia include the “same-day delivery sites” it’s working on.

Why should I care?

The claim from Amazon is that automation is not about eliminating jobs but rather mundane tasks. The goal is to integrate robots seamlessly into workflows. Though, I believe the pattern of people who can’t adapt to these newer workflows (which, I agree, is easier said than done) will lose their jobs.

Rivals like Walmart are also turning warehouse jobs into robot management roles. Amazon would also start to test a bipedal robot named Digit in its operations. We’re not ready for LLMs, now imagine AI + robotics.

Share this story

QUICK BITES

Commercial foundation models are becoming less transparent, according to researchers at Stanford's Center for Research on Foundation Models. Together with folks from MIT and Princeton, they created a new index called the Foundation Model Transparency Index (FMTI) to measure companies' transparency levels across 100 indicators.

What is going on here?

FMTI finds the top 10 major foundation model companies lacking in transparency.

What does this mean?

The group evaluated 10 major companies on 100 indicators covering how models are built, how they work, and how they're used downstream. The highest score was 54 out of 100 (Llama 2) showing much room for improvement across the board. Many critical details like training data sources, labor practices, and model usage stats weren't disclosed by any company.

The FMTI methodology and indicators are designed to avoid conflicts between transparency and other values like privacy and security.

Why should I care?

As foundation models spread across sectors, transparency is crucial for properly regulating these powerful systems and ensuring they are built and used responsibly. This lack of transparency makes it hard for businesses, academics, regulators and the public to understand these increasingly influential technologies.

Without basic details of how models work, issues like bias, privacy violations, and other harms can't even be identified, let alone addressed. Nine of the 10 companies have committed to managing AI risks, and the researchers hope the index will help them follow through. They also want to inform policymakers considering regulation around foundation models.

Share this story

Ben’s Bites Insights

We have 2 databases that are updated daily which you can access by sharing Ben’s Bites using the link below;

All 10k+ links we’ve covered, easily filterable (1 referral)
6k+ AI company funding rounds from Jan 2022, including investors, amounts, stage etc (3 referrals)

Reply

or to participate.