Skip to main content
Back to Research
Computer ArchitectureNVIDIA Developer Blog · 2026AI supercomputerrack-scale architectureGPU architectureNVLink interconnectHBM4 memoryAI inferencedata center designco-packaged optics

Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer

Kyle Aubrey

The NVIDIA Rubin platform introduces a rack-scale AI supercomputer architecture built on six co-designed chips (Vera CPU, Rubin GPU, NVLink 6 switch, ConnectX-9, BlueField-4 DPU, and Spectrum-6 Ethernet switch) optimized for continuous AI factory operations. The platform delivers extreme co-design across compute, networking, power delivery, and cooling to enable sustained intelligence production at scale, achieving 10x higher inference throughput and 10x lower cost per token compared to previous generations.

14 min read

Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer

1. Introduction: The AI Factory Era

The AI industry has entered a new industrial phase. What began as discrete systems for model training and inference has evolved into always-on AI factories that continuously convert power, silicon, and data into intelligence at scale. These factories now power applications that generate business plans, analyze markets, conduct research, and reason across vast knowledge bases.

"AI factories now underpin applications that generate business plans, analyze markets, conduct deep research, and reason across vast bodies of knowledge."

1.1 The Challenge

Modern AI workloads demand:

  • Hundreds of thousands of input tokens for long-context reasoning
  • Real-time inference under strict constraints
  • Agentic reasoning with complex multi-step workflows
  • Multimodal pipelines processing diverse data types

All while maintaining constraints on:

  • Power consumption
  • Reliability and uptime
  • Security and isolation
  • Deployment velocity
  • Cost per token

1.2 Three Scaling Laws Driving AI Progress

The evolution is captured by three fundamental scaling laws:

  1. Pre-training scaling: Models learn inherent knowledge
  2. Post-training scaling: Models learn to think through fine-tuning and reinforcement
  3. Test-time scaling: Models reason by generating more tokens during inference
Rendering diagram...

2. The NVIDIA Rubin Platform: Extreme Co-Design Philosophy

The Rubin platform represents a fundamental shift: treating the data center, not a single GPU server, as the unit of compute. This approach is built on extreme co-design where GPUs, CPUs, networking, security, software, power delivery, and cooling are architected together as a unified system.

2.1 Five Platform-Level Breakthroughs

BreakthroughDescription
Rack-scale coherence72 GPUs operate as single coherent machine
Sustained intelligence productionOptimized for continuous operation, not peak bursts
Predictable performanceDeterministic behavior under real workloads
Security-first architectureBuilt-in isolation and confidential computing
Operational efficiencyDesigned for zero-downtime maintenance

2.2 Vera Rubin NVL72: The Flagship System

The Vera Rubin NVL72 is a rack-scale system where the entire rack operates as a coherent machine within a larger AI factory. It's optimized for:

  • Predictable latency
  • High utilization across heterogeneous execution phases
  • Efficient conversion of power into usable intelligence
Rendering diagram...

3. Six New Chips, One AI Supercomputer

The Rubin platform integrates six purpose-built chips, each engineered for a specific role in the AI factory:

Rendering diagram...

3.1 Vera CPU: Purpose-Built for AI Factories

The Vera CPU is the high-bandwidth, low-latency data movement engine that keeps AI factories operating efficiently at scale.

Key Specifications

FeatureGrace CPUVera CPU
Cores72 Neoverse V288 NVIDIA Olympus
Threads72176 (Spatial Multithreading)
L2 Cache/core1MB2MB
L3 Cache114MB162MB
Memory BWUp to 512GB/sUp to 1.2TB/s
Memory CapacityUp to 480GB LPDDR5XUp to 1.5TB LPDDR5X
NVLink-C2C900GB/s1.8TB/s
PCIe/CXLGen5Gen6/CXL 3.1
Confidential ComputingNoYes

NVIDIA Olympus Core Innovation

Spatial Multithreading: A new multithreading approach that runs two hardware threads per core by physically partitioning resources instead of time-slicing. This enables:

  • Runtime tradeoff between performance and efficiency
  • Increased throughput and virtual CPU density
  • Predictable performance with strong isolation
  • Critical for multi-tenant AI factories

Scalable Coherency Fabric (SCF)

The second-generation SCF connects all 88 Olympus cores to shared L3 cache and memory on a single monolithic die:

  • Avoids chiplet boundaries for consistent latency
  • Sustains >90% of peak memory bandwidth under load
  • Enables deterministic, high-throughput data movement
  • Ensures linear scaling as core count increases

Coherent Memory Architecture

NVLink-C2C provides 1.8 TB/s of coherent bandwidth between Vera CPUs and Rubin GPUs, enabling:

  • Unified address space across CPU and GPU memory
  • Applications treat LPDDR5X and HBM4 as single coherent pool
  • Reduced data movement overhead
  • Efficient KV-cache offload and multi-model execution

3.2 Rubin GPU: Execution Engine for Transformer-Era AI

The Rubin GPU transforms rack-scale capability into sustained intelligence production, designed for continuous training, post-training, and inference.

Core Specifications

FeatureBlackwellRubin
Transistors208B336B
Compute Dies22
NVFP4 Inference10 PFLOPS*50 PFLOPS*
NVFP4 Training10 PFLOPS35 PFLOPS
Softmax Acceleration16 Ops/Clk/SM (FP32)
32 Ops/Clk/SM (FP16)
64 Ops/Clk/SM (FP16)

*Transformer Engine compute

Architecture Highlights

  • 224 Streaming Multiprocessors (SMs) with 6th-gen Tensor Cores
  • Optimized for NVFP4 and FP8 low-precision execution
  • Expanded Special Function Units (SFUs) for attention and activation
  • Improved branch prediction, prefetching, and load-store performance

Third-Generation Transformer Engine

New capabilities include:

  • Hardware-accelerated adaptive compression for NVFP4
  • Up to 50 PFLOPS NVFP4 for inference
  • Full backward compatibility with Blackwell GPUs
  • Automatic optimization without code changes

HBM4 Memory Subsystem

Revolutionary memory architecture:

  • Up to 288 GB of HBM4 per GPU
  • Up to 22 TB/s aggregate bandwidth (~3x vs Blackwell)
  • Doubled interface width compared to HBM3e
  • Improved decode and front-end efficiency

Scientific Computing Convergence

FeatureHopper GPUBlackwell GPURubin GPU
FP32 Vector (TFLOPS)6780130
FP32 Matrix (TFLOPS)67227*400*
FP64 Vector (TFLOPS)344033
FP64 Matrix (TFLOPS)67150*200*

*Peak performance using Tensor Core-based emulation algorithms

NVLink 6 eliminates communication bottlenecks by enabling 72 Rubin GPUs to operate as a single, tightly coupled accelerator.

Key Capabilities

  • 3.6 TB/s bidirectional bandwidth per GPU (2x vs previous gen)
  • Full all-to-all topology across the rack
  • Uniform latency - any GPU to any GPU
  • SHARP in-network compute: 14.4 TFLOPS FP8 per switch tray

MoE and Reasoning Optimization

For Mixture-of-Experts (MoE) workloads:

  • Up to 2x higher throughput for all-to-all operations
  • Dynamic token routing without fabric saturation
  • Efficient expert parallelism across all 72 GPUs

Operational Features

  • Hot-swappable trays
  • Continued operation with partial population
  • Dynamic traffic rerouting around faults
  • In-service software updates
  • Fine-grained link telemetry

3.4 ConnectX-9: AI Scale-Out Bandwidth

ConnectX-9 serves as the intelligent endpoint of the Spectrum-X Ethernet fabric, delivering 1.6 Tb/s of network bandwidth per Rubin GPU.

Endpoint Intelligence

Programmable congestion control at the endpoint:

  • Smooths traffic injection during all-to-all phases
  • Reduces head-of-line blocking
  • Maintains high effective bandwidth under load
  • Prevents congestion before it forms

Multi-Tenant Isolation

  • Enforces fairness and isolation per job/tenant
  • Predictable network behavior regardless of other workloads
  • Critical for shared AI infrastructure

Security Features

  • Data-in-transit encryption (IPsec, PSP)
  • Data-at-rest encryption for storage platforms
  • Secure boot and firmware authentication
  • Device attestation

3.5 BlueField-4 DPU: Operating System of the AI Factory

BlueField-4 is the processor powering the operating system of the AI factory, handling control, security, data movement, and orchestration independently of AI computation.

Architecture

Dual-die package combining:

  • 64-core NVIDIA Grace CPU for infrastructure offload
  • Integrated ConnectX-9 for tightly coupled data movement
  • Up to 800 Gb/s ultra-low-latency connectivity

Generational Improvements

FeatureBlueField-3BlueField-4
Bandwidth400 Gb/s800 Gb/s
Compute16 Arm A78 Cores64 Arm Neoverse V2
PerformanceBaseline6x
Memory BW75 GB/s250 GB/s
Memory Capacity32GB128GB
Cloud Networking32K hosts128K hosts
Data Encryption400Gb/s800Gb/s
NVMe Storage10M IOPs @ 4K20M IOPs @ 4K

ASTRA: Advanced Secure Trusted Resource Architecture

System-level trust architecture that:

  • Establishes trust domain within compute tray
  • Provides single, trusted control point
  • Enables secure bare-metal operation
  • Strong multi-tenant isolation
  • Trusted infrastructure control independent of host software

Inference Context Memory Storage

AI-native infrastructure tier powered by BlueField-4:

  • Pod-level access to shared inference context
  • Efficient reuse of KV caches across requests
  • Up to 5x boost in tokens per second
  • Up to 5x power efficiency vs traditional storage
  • Extends GPU memory capacity for long-context workloads

3.6 Spectrum-6 Ethernet Switch: Scale-Out and Scale-Across

Spectrum-6 advances NVIDIA's purpose-built Ethernet fabric for accelerated computing with co-packaged optics.

Core Specifications

  • 102.4 Tb/s per switch chip (2x vs Spectrum-4)
  • 200G PAM4 SerDes
  • 128x 800 Gb/s ports
  • Hardware-assisted performance isolation

Spectrum-X Ethernet Photonics

Revolutionary efficiency through co-packaged optics:

  • ~5x better network power efficiency
  • Lower end-to-end latency
  • Dramatically improved signal integrity
  • Optical loss reduced from ~22 dB to ~4 dB
  • Up to 64x better signal integrity

Platform Evolution

FeatureBlackwellRubin
PlatformSpectrum-X SN5000 + CX-8Spectrum-X SN6000 + CX-9
ChipSpectrum-4 + CX-8Spectrum-6 + CX-9
Switch BW51.2 Tb/s (64x 800G)102.4 Tb/s (128x 800G)
GPU BW800 Gb/s (2x400G)1600 Gb/s (2x800G)
SerDes100G PAM4200G PAM4
ProtocolEthernetEthernet
ConnectivityOSFPOSFP

AI Traffic Optimization

Spectrum-X Ethernet handles variable all-to-all communication:

  • Coordinated congestion control across fabric
  • Adaptive routing for bursty MoE traffic
  • Significantly faster job completion times
  • Distance-aware congestion control for geo-distributed deployments

4. From Chips to Systems: Scaling to AI Factory

The progression from silicon to deployable AI factory follows a deliberate path:

Rendering diagram...

4.1 Vera Rubin Superchip

The foundational compute building block combining:

  • Rubin GPU with HBM4
  • Vera CPU with LPDDR5X
  • NVLink-C2C coherent interconnect
  • Unified memory addressing

4.2 NVL72 Rack Architecture

Rack-scale integration features:

  • 72 Rubin GPUs in single coherent domain
  • NVLink 6 all-to-all fabric
  • Integrated power and cooling
  • BlueField-4 infrastructure control
  • Spectrum-X scale-out connectivity

4.3 DGX SuperPOD: Deployment-Scale Unit

The DGX SuperPOD represents the deployment-scale unit of an AI factory:

  • Multiple NVL72 racks
  • Unified management and orchestration
  • Predictable scaling characteristics
  • Production-ready operations
  • Enterprise support and services

5. Software and Developer Experience

The Rubin platform's software stack makes rack-scale systems programmable and accessible.

5.1 Software Foundation

  • NVIDIA CUDA: Core parallel computing platform
  • NVIDIA CUDA-X: Accelerated libraries for AI, HPC, data science
  • NVIDIA NCCL: Optimized collective communications
  • NVIDIA DOCA: Infrastructure services framework for BlueField DPUs

5.2 Framework Integration

Seamless integration with:

  • PyTorch, TensorFlow, JAX
  • NVIDIA NeMo for LLM training
  • NVIDIA TensorRT for inference optimization
  • NVIDIA Triton Inference Server

5.3 Programming Model

Key characteristics:

  • Transparent scaling: Code written for single GPU scales to 72
  • Coherent memory: Unified addressing across CPU/GPU
  • Automatic optimization: Frameworks leverage hardware features
  • Backward compatibility: Existing CUDA code runs unmodified

6. Operating at AI Factory Scale

Production foundations ensure reliability, security, and efficiency.

6.1 Reliability and Uptime

  • Hot-swappable components
  • Dynamic traffic rerouting
  • Continued operation during maintenance
  • Fine-grained telemetry and monitoring
  • Predictive failure detection

6.2 Security Architecture

Multi-layered security:

  • Confidential computing support (Vera CPU)
  • ASTRA trusted control plane (BlueField-4)
  • Data-in-transit encryption (ConnectX-9)
  • Secure boot and attestation
  • Strong multi-tenant isolation

6.3 Energy Efficiency

  • Co-packaged optics: ~5x network power efficiency
  • Optimized power delivery per rack
  • Efficient low-precision compute (NVFP4)
  • Workload-aware power management
  • Coherent memory reduces data movement

6.4 Ecosystem Readiness

  • Major cloud providers support
  • OEM partnerships for deployment
  • ISV software certification
  • Open networking standards (Ethernet)
  • Arm software ecosystem compatibility

7. Performance and Efficiency at Scale

The Rubin platform delivers measurable gains in real-world AI factory deployments.

7.1 Training Performance

  • One-fourth as many GPUs required for equivalent training throughput
  • Higher sustained utilization across training phases
  • Faster convergence through improved communication
  • Efficient scaling to thousands of GPUs

7.2 Inference Throughput

  • 10x higher inference throughput for long-context workloads
  • 10x lower cost per token through efficiency gains
  • Up to 2x improvement in MoE all-to-all operations
  • 5x tokens per second boost with Inference Context Memory Storage

7.3 Real-World Impact

Key metrics:

  • Sustained GPU utilization >90% in production
  • Predictable latency under variable load
  • Linear scaling efficiency to rack scale
  • Reduced time-to-deployment for new models

8. Why Rubin is the AI Factory Platform

8.1 Extreme Co-Design Delivers Results

The Rubin platform demonstrates that treating the data center as the unit of compute delivers:

  1. Predictable Performance: Behavior remains consistent under real workloads
  2. Economic Efficiency: Lower cost per token through sustained utilization
  3. Operational Scalability: Systems that can be deployed and maintained at scale
  4. Security by Design: Trust and isolation built into architecture
  5. Future-Proof Foundation: Coherent scaling path as AI demands grow

8.2 The Shift to Intelligence Production

"AI factories now function as always-on intelligence production systems, where efficiency in reasoning, context handling, and data movement determines performance."

The Rubin platform is purpose-built for this reality, where:

  • Reasoning models require sustained multi-step inference
  • Long contexts demand massive memory bandwidth
  • Agentic workflows interleave compute and communication
  • Multi-tenant operation requires strong isolation
  • Cost per token determines competitive advantage

8.3 From Components to Coherent Systems

The six-chip architecture works as one:

  • Vera CPU orchestrates data movement
  • Rubin GPU executes transformer workloads
  • NVLink 6 enables rack-scale coherence
  • ConnectX-9 controls scale-out endpoints
  • BlueField-4 operates the infrastructure
  • Spectrum-6 provides efficient scale-out fabric

Together, they transform the AI factory from a collection of servers into a unified intelligence production system.


Conclusion

The NVIDIA Rubin platform represents a fundamental architectural shift in how AI infrastructure is designed, deployed, and operated. By applying extreme co-design across compute, networking, memory, and infrastructure, Rubin establishes a new foundation for producing intelligence efficiently, securely, and predictably at scale.

As AI workloads continue to evolve toward longer contexts, deeper reasoning, and more complex agentic behaviors, the Rubin platform provides the architectural foundation to meet these demands while maintaining the economic efficiency required for widespread AI deployment.

The era of AI factories has arrived, and the Rubin platform is purpose-built to power them.