Daily Digest: Tools to make safe AI

PLUS: Smaller models are on the rise.

Sign up | Advertise | Ben’s Bites News
Daily Digest #302

Hello folks, here’s what we have today;

PICKS
  1. I wrote a post on ‘How to build an AI-powered company’. These are some of the tools I’d use (and why) if I were starting something new today. And you don’t need to know how to code.

  2. Meta is announcing Purple Llama, an open-source project to provide trust and safety tools and evaluations for developing responsible generative AI models. It is open-sourcing tools and benchmarks focused on cybersecurity and content safety for generative AI.🍿Our Summary (also below)

  3. Anthropic has developed a new method to measure and reduce discrimination in language model decisions for areas like loans, jobs, insurance claims etc. The solution? Just tell the AI to be nice. They also release a dataset covering 70 diverse scenarios including loan applications, visa approvals, and security clearances.🍿Our Summary (also below)

  4. Stability AI introduces StableLM Zephyr 3B, a new 3 billion parameter AI assistant model from StableLM designed to provide accurate and fast text generation on regular hardware.🍿Our Summary (also below)

ps: Adobe is working on a Chat with PDF tool in beta.

TOP TOOLS
  • Strut AI - All-in-one AI workspace designed for writers.

  • Openlayer - Slack or email alerts for when your AI fails.

  • VEED Captions App - The simplest way to create engaging short-form videos.

  • Pearl by Meta - Production-ready reinforcement learning AI agent library.

  • Ello - AI reading coach to make kids fall in love with books.

  • Open source function calling for Anthropic, or any other LLM.

NEWS

Unclassifieds - short, sponsored links

QUICK BITES

Meta is announcing Purple Llama, an open source project to provide trust and safety tools and evaluations for developing responsible generative AI models.

What is going on here?

Meta is open-sourcing tools and benchmarks focused on cybersecurity and content safety for generative AI to enable developers to build responsibly.

What does this mean?

To start Purple Llama, Meta is releasing CyberSec Eval, a set of cybersecurity benchmarks for evaluating potential risks in language models. You can test your LLM’s tendency to recommend insecure code and comply with malicious requests with CyberSec Eval.

Additionally, Meta is providing Llama Guard, a content safety classifier to filter risky outputs. It is a pre-trained model to help defend against generating potentially risky outputs.

Why should I care?

Open-source models are great. At the same time, open-source eval systems are also needed. Purple Llama is an umbrella project for such efforts. Even if you want to write your own evals, having a base set to rely on is great. The best way to ensure people follow safety standards for their deployments is by making it easier to do so.

QUICK BITES

Anthropic has developed a new method to measure and reduce discrimination in language model decisions for areas like loans, jobs, insurance claims etc. They release a dataset covering 70 diverse scenarios including loan applications, visa approvals, and security clearances.

What is going on here?

Simple techniques like adding “discrimination is illegal” reduce discriminatory language model outputs for high-stakes decisions.

What does this mean?

Anthropic created a 3-step process to systematically evaluate discrimination in language models.

  • Creating diverse decision scenarios like job offers or insurance claims where models might be used.

  • Creating question templates with demographic info as variables to measure bias.

  • Modifying demographics like age, race and gender while keeping other info equal.

The result highlighted both, negative discrimination and positive discrimination. Anthropic is also releasing the dataset used for this evaluation.

The study also tested various prompting strategies to mitigate discrimination. Effective options included asking models to ensure unbiased answers, provide rationales without stereotypes, and answer questions without considering demographic data. Two simple prompts nearly eliminated bias: stating discrimination is illegal and instructing the model to ignore demographic info.

Why should I care?

As language models spread to high-stakes decisions, developers and policymakers need tools to assess and address risks like discrimination. Anthropic's public release of their evaluation methodology allows wider testing for biases.

Their findings also demonstrate prompting as an accessible "dial" to control problematic outputs. Persuade the AI like you persuade a human.

QUICK BITES

Stability AI introduces StableLM Zephyr 3B, a new 3 billion parameter AI assistant model from StableLM designed to provide accurate and fast text generation on regular hardware.

What is going on here?

StableLM Zephyr 3B brings the power of large language models to more devices.

What does this mean?

Stable LM Zephyr 3B is built by fine-tuning on diverse datasets to compress more capability in smaller size. Zephyr 3B matches or exceeds the performance of multiple 7B models on instruction following, QA tasks, and text generation quality. The magic of a small 3B model is that users can get responsiveness and accuracy without expensive hardware.

Zephyr is being released under a non-commercial license that permits non-commercial use.

  • Download the model weights here.

  • Example notebook to optimize speed for this model here.

Why should I care?

Smaller high-performing models like Zephyr 3B mean you can try building capable AI tools for more average hardware setups. Products powered by Zephyr 3B could work well on phones, tablets, laptops etc. I’m interested to see how people create tiny personal tools with Zephyr.

Ben’s Bites Insights

We have 2 databases that are updated daily which you can access by sharing Ben’s Bites using the link below;

  • All 10k+ links we’ve covered, easily filterable (1 referral)

  • 6k+ AI company funding rounds from Jan 2022, including investors, amounts, stage etc (3 referrals)

Reply

or to participate.