How to solve AI discrimination?

Anthropic has developed a new method to measure and reduce discrimination in language model decisions for areas like loans, jobs, insurance claims etc. They release a dataset covering 70 diverse scenarios including loan applications, visa approvals, and security clearances.

What's going on here?

Simple techniques like adding “discrimination is illegal” reduce discriminatory language model outputs for high-stakes decisions.

What does this mean?

Anthropic created a 3-step process to systematically evaluate discrimination in language models.

  • Creating diverse decision scenarios like job offers or insurance claims where models might be used.

  • Creating question templates with demographic info as variables to measure bias.

  • Modifying demographics like age, race and gender while keeping other info equal.

The result highlighted both, negative discrimination and positive discrimination. Anthropic is also releasing the dataset used for this evaluation.

The study also tested various prompting strategies to mitigate discrimination. Effective options included asking models to ensure unbiased answers, provide rationales without stereotypes, and answer questions without considering demographic data. Two simple prompts nearly eliminated bias: stating discrimination is illegal and instructing the model to ignore demographic info.

Why should I care?

As language models spread to high-stakes decisions, developers and policymakers need tools to assess and address risks like discrimination. Anthropic's public release of their evaluation methodology allows wider testing for biases.

Their findings also demonstrate prompting as an accessible "dial" to control problematic outputs. Persuade the AI like you persuade a human.

Reply

or to participate.