Ben's Bites Newsletter
Posts
Nvidia makes it easy to run local LLMs

Nvidia makes it easy to run local LLMs

January 10, 2024

Nvidia has announced AI-ready versions of its consumer-grade chips. To top that up, they have shared multiple resources and developer tools to bring AI to your local computer in 2024.

What’s going on here?

Nvidia announced three new consumer AI chips and local AI tools.

What does that mean?

The three new chips are the “super” version of Nvidia’s RTX 40 series. RTX 4060 Super, RTX 4070 Ti Super and RTX 4080 Super range between $599 and $999. These GPUs have additional tensor cores that can run generative AI applications. Nvidia will also provide these in Laptops from companies such as Acer, Dell and Lenovo.

Using Nvidia’s TensorRT-LLM inference backend you can run these models locally or change a single line of code to seamlessly run models in the cloud or locally on the PC, with the TensorRT-LLM OpenAI Chat API Wrapper. Nvidia offers end-to-end developer tools to train LLMs with custom data in the cloud and quantize them for best performance on RTX PCs, all stitched together with AI Workbench.

Why should I care?

The demand for Nvidia’s larger GPUs such as A100s and H100s are through the roof already. But those chips are for AI companies to train and serve their models for mass usage via the cloud. It felt like AMD and Nvidia will take the cake in consumer AI hardware but Nvidia has deep plans there too.

With new AI-ready graphics cards and developer tools to make it easy to build with local LLMs, we could see Nvidia being the first choice of many people as we go ahead.

Reply

or to participate.