Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer
The NVIDIA Rubin platform introduces a rack-scale AI supercomputer architecture built on six co-designed chips (Vera CPU, Rubin GPU, NVLink 6 switch, ConnectX-9, BlueField-4 DPU, and Spectrum-6 Ethernet switch) optimized for continuous AI factory operations. The platform delivers extreme co-design across compute, networking, power delivery, and cooling to enable sustained intelligence production at scale, achieving 10x higher inference throughput and 10x lower cost per token compared to previous generations.
Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer
1. Introduction: The AI Factory Era
The AI industry has entered a new industrial phase. What began as discrete systems for model training and inference has evolved into always-on AI factories that continuously convert power, silicon, and data into intelligence at scale. These factories now power applications that generate business plans, analyze markets, conduct research, and reason across vast knowledge bases.
"AI factories now underpin applications that generate business plans, analyze markets, conduct deep research, and reason across vast bodies of knowledge."
1.1 The Challenge
Modern AI workloads demand:
- Hundreds of thousands of input tokens for long-context reasoning
- Real-time inference under strict constraints
- Agentic reasoning with complex multi-step workflows
- Multimodal pipelines processing diverse data types
All while maintaining constraints on:
- Power consumption
- Reliability and uptime
- Security and isolation
- Deployment velocity
- Cost per token
1.2 Three Scaling Laws Driving AI Progress
The evolution is captured by three fundamental scaling laws:
- Pre-training scaling: Models learn inherent knowledge
- Post-training scaling: Models learn to think through fine-tuning and reinforcement
- Test-time scaling: Models reason by generating more tokens during inference
2. The NVIDIA Rubin Platform: Extreme Co-Design Philosophy
The Rubin platform represents a fundamental shift: treating the data center, not a single GPU server, as the unit of compute. This approach is built on extreme co-design where GPUs, CPUs, networking, security, software, power delivery, and cooling are architected together as a unified system.
2.1 Five Platform-Level Breakthroughs
| Breakthrough | Description |
|---|---|
| Rack-scale coherence | 72 GPUs operate as single coherent machine |
| Sustained intelligence production | Optimized for continuous operation, not peak bursts |
| Predictable performance | Deterministic behavior under real workloads |
| Security-first architecture | Built-in isolation and confidential computing |
| Operational efficiency | Designed for zero-downtime maintenance |
2.2 Vera Rubin NVL72: The Flagship System
The Vera Rubin NVL72 is a rack-scale system where the entire rack operates as a coherent machine within a larger AI factory. It's optimized for:
- Predictable latency
- High utilization across heterogeneous execution phases
- Efficient conversion of power into usable intelligence
3. Six New Chips, One AI Supercomputer
The Rubin platform integrates six purpose-built chips, each engineered for a specific role in the AI factory:
3.1 Vera CPU: Purpose-Built for AI Factories
The Vera CPU is the high-bandwidth, low-latency data movement engine that keeps AI factories operating efficiently at scale.
Key Specifications
| Feature | Grace CPU | Vera CPU |
|---|---|---|
| Cores | 72 Neoverse V2 | 88 NVIDIA Olympus |
| Threads | 72 | 176 (Spatial Multithreading) |
| L2 Cache/core | 1MB | 2MB |
| L3 Cache | 114MB | 162MB |
| Memory BW | Up to 512GB/s | Up to 1.2TB/s |
| Memory Capacity | Up to 480GB LPDDR5X | Up to 1.5TB LPDDR5X |
| NVLink-C2C | 900GB/s | 1.8TB/s |
| PCIe/CXL | Gen5 | Gen6/CXL 3.1 |
| Confidential Computing | No | Yes |
NVIDIA Olympus Core Innovation
Spatial Multithreading: A new multithreading approach that runs two hardware threads per core by physically partitioning resources instead of time-slicing. This enables:
- Runtime tradeoff between performance and efficiency
- Increased throughput and virtual CPU density
- Predictable performance with strong isolation
- Critical for multi-tenant AI factories
Scalable Coherency Fabric (SCF)
The second-generation SCF connects all 88 Olympus cores to shared L3 cache and memory on a single monolithic die:
- Avoids chiplet boundaries for consistent latency
- Sustains >90% of peak memory bandwidth under load
- Enables deterministic, high-throughput data movement
- Ensures linear scaling as core count increases
Coherent Memory Architecture
NVLink-C2C provides 1.8 TB/s of coherent bandwidth between Vera CPUs and Rubin GPUs, enabling:
- Unified address space across CPU and GPU memory
- Applications treat LPDDR5X and HBM4 as single coherent pool
- Reduced data movement overhead
- Efficient KV-cache offload and multi-model execution
3.2 Rubin GPU: Execution Engine for Transformer-Era AI
The Rubin GPU transforms rack-scale capability into sustained intelligence production, designed for continuous training, post-training, and inference.
Core Specifications
| Feature | Blackwell | Rubin |
|---|---|---|
| Transistors | 208B | 336B |
| Compute Dies | 2 | 2 |
| NVFP4 Inference | 10 PFLOPS* | 50 PFLOPS* |
| NVFP4 Training | 10 PFLOPS | 35 PFLOPS |
| Softmax Acceleration | 16 Ops/Clk/SM (FP32) 32 Ops/Clk/SM (FP16) | 64 Ops/Clk/SM (FP16) |
*Transformer Engine compute
Architecture Highlights
- 224 Streaming Multiprocessors (SMs) with 6th-gen Tensor Cores
- Optimized for NVFP4 and FP8 low-precision execution
- Expanded Special Function Units (SFUs) for attention and activation
- Improved branch prediction, prefetching, and load-store performance
Third-Generation Transformer Engine
New capabilities include:
- Hardware-accelerated adaptive compression for NVFP4
- Up to 50 PFLOPS NVFP4 for inference
- Full backward compatibility with Blackwell GPUs
- Automatic optimization without code changes
HBM4 Memory Subsystem
Revolutionary memory architecture:
- Up to 288 GB of HBM4 per GPU
- Up to 22 TB/s aggregate bandwidth (~3x vs Blackwell)
- Doubled interface width compared to HBM3e
- Improved decode and front-end efficiency
Scientific Computing Convergence
| Feature | Hopper GPU | Blackwell GPU | Rubin GPU |
|---|---|---|---|
| FP32 Vector (TFLOPS) | 67 | 80 | 130 |
| FP32 Matrix (TFLOPS) | 67 | 227* | 400* |
| FP64 Vector (TFLOPS) | 34 | 40 | 33 |
| FP64 Matrix (TFLOPS) | 67 | 150* | 200* |
*Peak performance using Tensor Core-based emulation algorithms
3.3 NVLink 6 Switch: Rack-Scale Scale-Up Fabric
NVLink 6 eliminates communication bottlenecks by enabling 72 Rubin GPUs to operate as a single, tightly coupled accelerator.
Key Capabilities
- 3.6 TB/s bidirectional bandwidth per GPU (2x vs previous gen)
- Full all-to-all topology across the rack
- Uniform latency - any GPU to any GPU
- SHARP in-network compute: 14.4 TFLOPS FP8 per switch tray
MoE and Reasoning Optimization
For Mixture-of-Experts (MoE) workloads:
- Up to 2x higher throughput for all-to-all operations
- Dynamic token routing without fabric saturation
- Efficient expert parallelism across all 72 GPUs
Operational Features
- Hot-swappable trays
- Continued operation with partial population
- Dynamic traffic rerouting around faults
- In-service software updates
- Fine-grained link telemetry
3.4 ConnectX-9: AI Scale-Out Bandwidth
ConnectX-9 serves as the intelligent endpoint of the Spectrum-X Ethernet fabric, delivering 1.6 Tb/s of network bandwidth per Rubin GPU.
Endpoint Intelligence
Programmable congestion control at the endpoint:
- Smooths traffic injection during all-to-all phases
- Reduces head-of-line blocking
- Maintains high effective bandwidth under load
- Prevents congestion before it forms
Multi-Tenant Isolation
- Enforces fairness and isolation per job/tenant
- Predictable network behavior regardless of other workloads
- Critical for shared AI infrastructure
Security Features
- Data-in-transit encryption (IPsec, PSP)
- Data-at-rest encryption for storage platforms
- Secure boot and firmware authentication
- Device attestation
3.5 BlueField-4 DPU: Operating System of the AI Factory
BlueField-4 is the processor powering the operating system of the AI factory, handling control, security, data movement, and orchestration independently of AI computation.
Architecture
Dual-die package combining:
- 64-core NVIDIA Grace CPU for infrastructure offload
- Integrated ConnectX-9 for tightly coupled data movement
- Up to 800 Gb/s ultra-low-latency connectivity
Generational Improvements
| Feature | BlueField-3 | BlueField-4 |
|---|---|---|
| Bandwidth | 400 Gb/s | 800 Gb/s |
| Compute | 16 Arm A78 Cores | 64 Arm Neoverse V2 |
| Performance | Baseline | 6x |
| Memory BW | 75 GB/s | 250 GB/s |
| Memory Capacity | 32GB | 128GB |
| Cloud Networking | 32K hosts | 128K hosts |
| Data Encryption | 400Gb/s | 800Gb/s |
| NVMe Storage | 10M IOPs @ 4K | 20M IOPs @ 4K |
ASTRA: Advanced Secure Trusted Resource Architecture
System-level trust architecture that:
- Establishes trust domain within compute tray
- Provides single, trusted control point
- Enables secure bare-metal operation
- Strong multi-tenant isolation
- Trusted infrastructure control independent of host software
Inference Context Memory Storage
AI-native infrastructure tier powered by BlueField-4:
- Pod-level access to shared inference context
- Efficient reuse of KV caches across requests
- Up to 5x boost in tokens per second
- Up to 5x power efficiency vs traditional storage
- Extends GPU memory capacity for long-context workloads
3.6 Spectrum-6 Ethernet Switch: Scale-Out and Scale-Across
Spectrum-6 advances NVIDIA's purpose-built Ethernet fabric for accelerated computing with co-packaged optics.
Core Specifications
- 102.4 Tb/s per switch chip (2x vs Spectrum-4)
- 200G PAM4 SerDes
- 128x 800 Gb/s ports
- Hardware-assisted performance isolation
Spectrum-X Ethernet Photonics
Revolutionary efficiency through co-packaged optics:
- ~5x better network power efficiency
- Lower end-to-end latency
- Dramatically improved signal integrity
- Optical loss reduced from ~22 dB to ~4 dB
- Up to 64x better signal integrity
Platform Evolution
| Feature | Blackwell | Rubin |
|---|---|---|
| Platform | Spectrum-X SN5000 + CX-8 | Spectrum-X SN6000 + CX-9 |
| Chip | Spectrum-4 + CX-8 | Spectrum-6 + CX-9 |
| Switch BW | 51.2 Tb/s (64x 800G) | 102.4 Tb/s (128x 800G) |
| GPU BW | 800 Gb/s (2x400G) | 1600 Gb/s (2x800G) |
| SerDes | 100G PAM4 | 200G PAM4 |
| Protocol | Ethernet | Ethernet |
| Connectivity | OSFP | OSFP |
AI Traffic Optimization
Spectrum-X Ethernet handles variable all-to-all communication:
- Coordinated congestion control across fabric
- Adaptive routing for bursty MoE traffic
- Significantly faster job completion times
- Distance-aware congestion control for geo-distributed deployments
4. From Chips to Systems: Scaling to AI Factory
The progression from silicon to deployable AI factory follows a deliberate path:
4.1 Vera Rubin Superchip
The foundational compute building block combining:
- Rubin GPU with HBM4
- Vera CPU with LPDDR5X
- NVLink-C2C coherent interconnect
- Unified memory addressing
4.2 NVL72 Rack Architecture
Rack-scale integration features:
- 72 Rubin GPUs in single coherent domain
- NVLink 6 all-to-all fabric
- Integrated power and cooling
- BlueField-4 infrastructure control
- Spectrum-X scale-out connectivity
4.3 DGX SuperPOD: Deployment-Scale Unit
The DGX SuperPOD represents the deployment-scale unit of an AI factory:
- Multiple NVL72 racks
- Unified management and orchestration
- Predictable scaling characteristics
- Production-ready operations
- Enterprise support and services
5. Software and Developer Experience
The Rubin platform's software stack makes rack-scale systems programmable and accessible.
5.1 Software Foundation
- NVIDIA CUDA: Core parallel computing platform
- NVIDIA CUDA-X: Accelerated libraries for AI, HPC, data science
- NVIDIA NCCL: Optimized collective communications
- NVIDIA DOCA: Infrastructure services framework for BlueField DPUs
5.2 Framework Integration
Seamless integration with:
- PyTorch, TensorFlow, JAX
- NVIDIA NeMo for LLM training
- NVIDIA TensorRT for inference optimization
- NVIDIA Triton Inference Server
5.3 Programming Model
Key characteristics:
- Transparent scaling: Code written for single GPU scales to 72
- Coherent memory: Unified addressing across CPU/GPU
- Automatic optimization: Frameworks leverage hardware features
- Backward compatibility: Existing CUDA code runs unmodified
6. Operating at AI Factory Scale
Production foundations ensure reliability, security, and efficiency.
6.1 Reliability and Uptime
- Hot-swappable components
- Dynamic traffic rerouting
- Continued operation during maintenance
- Fine-grained telemetry and monitoring
- Predictive failure detection
6.2 Security Architecture
Multi-layered security:
- Confidential computing support (Vera CPU)
- ASTRA trusted control plane (BlueField-4)
- Data-in-transit encryption (ConnectX-9)
- Secure boot and attestation
- Strong multi-tenant isolation
6.3 Energy Efficiency
- Co-packaged optics: ~5x network power efficiency
- Optimized power delivery per rack
- Efficient low-precision compute (NVFP4)
- Workload-aware power management
- Coherent memory reduces data movement
6.4 Ecosystem Readiness
- Major cloud providers support
- OEM partnerships for deployment
- ISV software certification
- Open networking standards (Ethernet)
- Arm software ecosystem compatibility
7. Performance and Efficiency at Scale
The Rubin platform delivers measurable gains in real-world AI factory deployments.
7.1 Training Performance
- One-fourth as many GPUs required for equivalent training throughput
- Higher sustained utilization across training phases
- Faster convergence through improved communication
- Efficient scaling to thousands of GPUs
7.2 Inference Throughput
- 10x higher inference throughput for long-context workloads
- 10x lower cost per token through efficiency gains
- Up to 2x improvement in MoE all-to-all operations
- 5x tokens per second boost with Inference Context Memory Storage
7.3 Real-World Impact
Key metrics:
- Sustained GPU utilization >90% in production
- Predictable latency under variable load
- Linear scaling efficiency to rack scale
- Reduced time-to-deployment for new models
8. Why Rubin is the AI Factory Platform
8.1 Extreme Co-Design Delivers Results
The Rubin platform demonstrates that treating the data center as the unit of compute delivers:
- Predictable Performance: Behavior remains consistent under real workloads
- Economic Efficiency: Lower cost per token through sustained utilization
- Operational Scalability: Systems that can be deployed and maintained at scale
- Security by Design: Trust and isolation built into architecture
- Future-Proof Foundation: Coherent scaling path as AI demands grow
8.2 The Shift to Intelligence Production
"AI factories now function as always-on intelligence production systems, where efficiency in reasoning, context handling, and data movement determines performance."
The Rubin platform is purpose-built for this reality, where:
- Reasoning models require sustained multi-step inference
- Long contexts demand massive memory bandwidth
- Agentic workflows interleave compute and communication
- Multi-tenant operation requires strong isolation
- Cost per token determines competitive advantage
8.3 From Components to Coherent Systems
The six-chip architecture works as one:
- Vera CPU orchestrates data movement
- Rubin GPU executes transformer workloads
- NVLink 6 enables rack-scale coherence
- ConnectX-9 controls scale-out endpoints
- BlueField-4 operates the infrastructure
- Spectrum-6 provides efficient scale-out fabric
Together, they transform the AI factory from a collection of servers into a unified intelligence production system.
Conclusion
The NVIDIA Rubin platform represents a fundamental architectural shift in how AI infrastructure is designed, deployed, and operated. By applying extreme co-design across compute, networking, memory, and infrastructure, Rubin establishes a new foundation for producing intelligence efficiently, securely, and predictably at scale.
As AI workloads continue to evolve toward longer contexts, deeper reasoning, and more complex agentic behaviors, the Rubin platform provides the architectural foundation to meet these demands while maintaining the economic efficiency required for widespread AI deployment.
The era of AI factories has arrived, and the Rubin platform is purpose-built to power them.