• Ben's Bites
  • Posts
  • ByteDance is using GPT-4 to train its models

ByteDance is using GPT-4 to train its models

ByteDance got busted using responses from OpenAI’s AI models to secretly build their own chatbot. Not cool since that breaks OpenAI and Microsoft’s rules. Alex from The Verge reported this and since then, OpenAI has suspended ByteDance’s account on their platform.

What’s going on here?

ByteDance leaned hard on OpenAI's API to develop Project Seed, knowing it was illegal.

What does that mean?

ByteDance used OpenAI’s API for pretty much every part of making its chatbot (Project Seed), including training it and testing how well it works.

For context: OpenAI (and most AI companies) don’t allow training new models on the outputs of their models. Mistral allows that, we wrote about them.

While ByteDance recently ordered the team to stop using GPT-generated text, the API is still secretly used to evaluate Seed's performance. The ByteDance team is under mad pressure to match GPT-3.5 by the end of 2023 (we’re already there) and GPT-4 by mid-2024.

Open AI has suspended ByteDance’s API account in the meantime. However, the majority of use from ByteDance happened via Microsoft Azure.

Why should I care?

Startups have been using synthetic data created by GPT-4 to train models for a few months and haven’t seen much pushback from OpenAI. The same’s been true for open-source models being fine-tuned over GPT responses. But with bigger companies like ByteDance doing the same, we’ll likely see OpenAI tighten its response.

Reply

or to participate.