Ben's Bites Newsletter
Posts
Stable Diffusion 3 solves spellings in image generation.

Stable Diffusion 3 solves spellings in image generation.

February 23, 2024

Stability AI has announced its new image generation model: Stable Diffusion 3. While it’s not out yet, Stability claims it’s their best text-to-model with new architecture, responsible training and various model sizes.

What’s going on here?

Stable Diffusion 3 is in early research preview.

What does that mean?

From Twitter sample images, looks like this model can spell correct text and focus on multiple subjects when creating images. Check out some of them:

“Photo of a red sphere on top of a blue cube. Behind them is a green triangle, on the right is a dog, on the left is a cat”

"Cover artwork of the album 'The Machine Crusade' by the cybernetic heavy metal band 'Hexagon Machine'"

The SD3 family of models range from 800M to 8B parameters. While the previous image models from Stability were “diffusion models” this uses a mix of diffusion and transformer architecture that would allow the model to scale even further. SD3 will also be able to accept multimodal inputs.

Stability AI also claim that they cleaned all their dataset based on Spawning AI’s Do Not Train registry which has over 1.5B opt-out requests and other manual requests made to Stability.

Why should I care?

The model is not out for everyone to use. But you can get on the waitlist for an early preview. Multi-subject and spelling errors are big problems in image generation models (hands are pretty much solved now), and SD3 takes them head-on.

Currently, many indie image tools are about human images and photorealism, rather than corporate use cases like Graphic Design. With SD3 I think that barrier is gonna break and we’ll see many such indie tools.

Reply

or to participate.