NVIDIA’s Rubin CPX Promises to Crush AI Inference Bottlenecks — With 30x ROI

Rubin CPX is aimed at high-value use cases such as codebase reasoning and HD video generation.

NVIDIA’s Rubin CPX Promises to Crush AI Inference Bottlenecks — With 30x ROI
(Image-Freepik)

SANTA CLARA, Calif., September 12, 2025 — NVIDIA has announced a major leap in AI infrastructure with the introduction of the Rubin CPX GPU, a processor purpose-built to handle the growing complexity of inference in next-generation AI models.

As models evolve into agentic systems capable of multi-step reasoning, long-horizon context, and persistent memory, inference is becoming the new bottleneck for large-scale AI applications in fields such as software development, video generation, and deep research. These workloads demand unprecedented levels of compute, memory, and networking performance.

Rubin CPX GPU is designed for the context phase, delivering 30 petaFLOPs of NVFP4 compute, 128 GB of GDDR7 memory, built-in video encoding and decoding, and 3x faster attention acceleration compared to the GB300 NVL72. Optimised for processing long sequences of data, Rubin CPX is aimed at high-value use cases such as codebase reasoning and HD video generation.

NVIDIA is also rolling out the Vera Rubin NVL144 CPX rack, which integrates 144 Rubin CPX GPUs, 144 Rubin GPUs, and 36 Vera CPUs, achieving 8 exaFLOPs of NVFP4 compute, 100 TB of high-speed memory, and 1.7 PB/s of bandwidth in a single rack.

"Using NVIDIA Quantum-X800 InfiniBand or Spectrum-X Ethernet, paired with NVIDIA ConnectX-9 SuperNICs and orchestrated by the Dynamo platform, the Vera Rubin NVL144 CPX is built to power the next wave of million-token context AI inference workloads—cutting inference costs and unlocking advanced capabilities for developers and creators worldwide," NVIDIA said in a blog post.

NVIDIA estimates the platform can deliver 30x–50x ROI, translating to as much as $5 billion in revenue from a $100 million CAPEX investment. With Rubin CPX, the company is setting a new benchmark for inference economics and positioning itself at the forefront of next-generation generative AI infrastructure.

"The NVIDIA Rubin CPX GPU and the NVIDIA Vera Rubin NVL144 CPX rack exemplify the SMART platform philosophy—delivering scalable, multi-dimensional performance, and ROI through architectural innovation and ecosystem integration," it added.