Ben's Bites Newsletter
Posts
Purple Llama by Meta - Evals and Models for Open Source Safety

Purple Llama by Meta - Evals and Models for Open Source Safety

December 08, 2023

Meta is announcing Purple Llama, an open source project to provide trust and safety tools and evaluations for developing responsible generative AI models.

What's going on here?

Meta is open-sourcing tools and benchmarks focused on cybersecurity and content safety for generative AI to enable developers to build responsibly.

What does this mean?

To start Purple Llama, Meta is releasing CyberSec Eval, a set of cybersecurity benchmarks for evaluating potential risks in language models. You can test your LLM’s tendency to recommend insecure code and comply with malicious requests with CyberSec Eval.

Additionally, Meta is providing Llama Guard, a content safety classifier to filter risky outputs. It is a pre-trained model to help defend against generating potentially risky outputs.

Why should I care?

Open-source models are great. At the same time, open-source eval systems are also needed. Purple Llama is an umbrella project for such efforts. Even if you want to write your own evals, having a base set to rely on is great. The best way to ensure people follow safety standards for their deployments is by making it easier to do so.

Reply

or to participate.