• Ben's Bites
  • Posts
  • Big tech firms secretly used YouTube videos to train AI.

Big tech firms secretly used YouTube videos to train AI.

Turns out, AI giants have been sneakily snagging YouTube vids to teach their bots. Creators are not happy, and the legal waters are murky.

What's going on here?

An investigation found that major AI companies like Apple, Nvidia, and Anthropic used subtitles from over 170,000 YouTube videos to train their AI models - without creators' knowledge or consent.

A tool by Proof News to see which videos are the dataset.

What does this mean?

Big names in tech (we're talking Apple, Nvidia, Anthropic) used a dataset called "YouTube Subtitles" to train AI models. This dataset includes content from 48,000+ channels, including major YouTubers, educational channels, and even some conspiracy theory videos.

Many creators had no idea their content was being used this way. Some are calling it theft. The dataset was part of a larger compilation created by Eleuther AI called "The Pile," which also includes stuff like Wikipedia articles and Enron emails.

YouTube's terms of service prohibit this kind of automated data scraping, but companies argue it's different when using a pre-compiled dataset. You can see if your videos were in the dataset using Proof News tool.

Why should I care?

This is a big deal for content creators and the future of AI. It raises questions about consent, compensation, and the potential for AI to replicate (or replace) human-created content. Plus, it shows how even deleted videos can live on in AI training data. As AI gets smarter, the debate over using creators' work without permission is only going to heat up.

Reply

or to participate.