Nvidia targets inference as AI’s next battleground with Groq 3 LPX

It’s a big cost play, he pointed out, and it “has to happen everywhere, all the time, for all users.”

The next phase of inferencing

The new Groq 3 language processing units (LPUs) are based on intellectual property (IP) from Groq, which signed a $20 billion licensing agreement with Nvidia late last year. According to the chip company, a fleet of LPUs can function as a “giant single processor.”

While Rubin GPUs will continue to handle prefill (prompt processing), Groq’s LPX will now handle latency-sensitive portions of decode (response). Together, they can deliver a “new class of inference performance,” Nvidia says.

Each LPX rack features 256 LPUs with 128 GB of on-chip static random-access memory (SRAM), 150 terabyte per second (TB/s) bandwidth, chip-to-chip links and high-speed connections to NVL72, Nvidia’s liquid-cooled AI supercomputer. Combined, these can reduce latency to “near zero,” Nvidia claims.

The LPX integration with Vera Rubin AI factories will be available in the second half of this year.

Training versus inferencing

Training and inference stress infrastructure in very different ways, noted Sanchit Vir Gogia, chief analyst at Greyhound Research. While training rewards “massive parallelism and brute-force scale,” inferencing (especially for long context and interactive reasoning) is far more sensitive to latency, memory movement, cache behavior, concurrency, and cost per delivered token.

Nvidia targets inference as AI’s next battleground with Groq 3 LPX

Tether is shipping TurboQuant KV-cache quantization with Vulkan support into its QVAC SDK

HPE Discover: Neri outlines an AI architecture built for agents

HPE product barrage targets AI networks, agents, management

Topics matter for third-party authority signals

Tether is shipping TurboQuant KV-cache quantization with Vulkan support into its QVAC SDK

The Integrated Search Brief That Aligns SEO, PPC & Content In The AI Search Era

Microsoft Ads expands LinkedIn targeting with job seniority filters

Our Picks

Topics matter for third-party authority signals

Tether is shipping TurboQuant KV-cache quantization with Vulkan support into its QVAC SDK

The Integrated Search Brief That Aligns SEO, PPC & Content In The AI Search Era

Nvidia targets inference as AI’s next battleground with Groq 3 LPX

The next phase of inferencing

Training versus inferencing

Related Posts