Ben's Bites Newsletter
Posts
What is Open AI's Preparedness Framework

What is Open AI's Preparedness Framework

December 19, 2023

OpenAI just put together another safety squad called the Preparedness Team. They also cooked up something called the Preparedness Framework to go with it.

Basically, they wanna make sure their next-level AI models are chill to release into the world, and not gonna cause any trouble.

What’s going on here?

OpenAI is setting another framework to evaluate whether the humans and models are prepared to face each other.

What does that mean?

Open AI has many safety teams within it. The Superalignment team focuses on existential risks, and talks about artificial superintelligence that will surpass humans. At the same time, it has model safety teams that make sure models like GPT-3.5 and GPT-4 are safe for everyday use.

This new preparedness team will focus on the soon-to-come risks of the most advanced AI models AKA frontier models. Its work will be grounded in fact with a builder mindset.

The framework breaks it down into different areas like hacking risks, how persuasive the models could be to humans, how independently they can act, and more. They'll give each model a safety rating - low, medium, high or critical risk. Only the low and medium-risk ones get the green light to launch. High-risk models can be developed further. You can get more details about the framework (beta) here.

The preparedness team will do the technical work for evaluating the models, and then with external outputs from safety advisors, OpenAI leadership will make the final decisions. The Board of Directors (yes, the infamous board) has the power to reverse the decisions if they feel the models are not safe enough.

Why should I care?

Recently, Deepmind’s LLM solved a previously unsolved maths problem. Another research paper showed that visual models can solve captchas better than humans now. So, if AI models can create new toxins, find new loopholes in security systems, or use your computer automatically, the risk factor from them increases. These risks are more plausible than the hypothetical scenarios of AI bots killing humans. So having a framework for knowing these models’ limits is crucial to be proactive while developing these models, instead of patching up afterwards.

One hyperbole from all the safety stuff OpenAI has been releasing in the last few days is that OpenAI has a new model with higher intelligence (and risks) and the team is preparing the ground for its release. Again just a hyperbole/rumor/speculation, whatever you want to call it.

Reply

or to participate.