Skip to main content

Expert Modules

Deep-dive technical modules covering system architecture, performance analysis, and AI infrastructure.

21
Total Modules
31h
Total Content
6
Categories
17
Expert Level

Advanced GPU Architecture for ML

expert

Deep dive into modern GPU architectures optimized for machine learning, from latest datacenter GPUs to next-generation designs

AI Hardware
6 min read

AI Hardware Simulation & Modeling

expert

Develop high-fidelity simulators and performance models for evaluating next-generation AI accelerator architectures

Modeling
7 min read

AI Workload Analysis & Benchmarking

advanced

Master the techniques for profiling, characterizing, and optimizing deep learning workloads across different hardware platforms

Performance
5 min read

Benchmarks & Workloads — MLPerf Essentials

intermediate

What MLPerf Inference/Training measure, how to read QPS/latency/accuracy, and pragmatic usage for architecture evaluation

MLSystems75m
4 exercises
3 tools
4 applications
#MLPerf#benchmarks#inference#training
2 min read

Cluster-Level Thinking — Scheduling, Placement, Isolation

expert

SRE and platform engineering for ML training/serving clusters: resource allocation, gang scheduling, and system-level optimization

DatacenterArch110m
4 exercises
5 tools
4 applications
#scheduling#placement#isolation#cluster
13 min read

Cluster-Level Thinking — Scheduling, Placement, Isolation

expert

SRE and platform engineering for ML training/serving clusters: resource allocation, gang scheduling, and system-level optimization

DatacenterArch110m
4 exercises
5 tools
4 applications
#scheduling#placement#isolation#cluster
13 min read

Deep Learning ASIC Architecture

expert

Master the design principles of custom AI accelerators, from tensor processing units to emerging neuromorphic architectures

AI Hardware
4 min read

Interconnect Fabrics for AI Systems

expert

Design and optimization of high-performance interconnects for distributed AI training and inference systems

AI Systems
7 min read

ML Systems in Datacenters — LLM Inference Realities

expert

TTFT vs tokens/s optimization, batching strategies, KV-cache memory management, PagedAttention/vLLM impact, and practical serving tactics

MLSystems120m
4 exercises
4 tools
4 applications
#LLM#inference#KV-cache#batching
15 min read

Modeling & Simulation

expert

Strategic simulation methodology: choose the right simulation paradigm and fidelity level; ask targeted questions, validate against reality

Performance220m
9 exercises
23 tools
7 applications
#simulation#modeling#DES#discrete-event
20 min read

Multi-Node AI Training Systems

expert

Master the design and optimization of distributed AI training systems across hundreds of nodes and GPUs

AI Systems
5 min read

Multimodal Foundation Models: Architecture & System Design

expert

Comprehensive analysis of multimodal foundation model architectures, training methodologies, and system engineering challenges for vision-language AI systems

MLSystems180m
4 exercises
5 tools
4 applications
#multimodal#foundation-models#vision-language#cross-attention
22 min read

Power & Thermal Awareness — From Activity to perf/W

expert

Translate simulated activity into power/thermal behavior and communicate perf/W trade-offs credibly using McPAT and HotSpot

Performance140m
4 exercises
5 tools
4 applications
#power#thermal#McPAT#HotSpot
2 min read

PPA Analysis Methodologies

expert

Master Performance, Power, and Area analysis techniques for evaluating hardware design trade-offs in AI accelerators

Performance
7 min read

Queueing Theory for Computer Architects — Basics & Advanced

advanced

Master queueing theory fundamentals and advanced techniques for analyzing performance, tail latency, and congestion control in modern computer systems, from CPU pipelines to datacenter-scale networks

Performance240m
6 exercises
5 tools
6 applications
#queueing-theory#performance-modeling#tail-latency#congestion-control
34 min read

Sampling & Representativeness — SimPoint, SMARTS, ROI Discipline

advanced

Cut simulation time while bounding error and preserving phase behavior through systematic sampling methodologies

Performance90m
4 exercises
5 tools
4 applications
#SimPoint#SMARTS#sampling#phases
2 min read

System & Microarchitecture Deep Dive

expert

End-to-end reasoning about compute + data pathologies with evidence-based fixes for CPU pipelines, GPU occupancy, and memory hierarchies

MLSystems180m
4 exercises
5 tools
4 applications
#CPU#GPU#NUMA#occupancy
30 min read

Tail Latency & Scale-Out — p95/p99/p99.9 Engineering

expert

Design for tails, not means: queueing theory, amplification effects, and tail-tolerant distributed system patterns

DatacenterArch100m
4 exercises
4 tools
4 applications
#tail-latency#p99#queueing#scale-out
2 min read

Tools & Methods: Top-Down, CDRD, and Roofline

expert

Turn counters and simple models into clear diagnoses and action items using systematic performance analysis methodologies

Performance150m
4 exercises
5 tools
4 applications
#Top-Down#roofline#performance-analysis#profiling
4 min read

Transformer Hardware Optimization

expert

Deep dive into optimizing hardware architectures for transformer-based models, from attention mechanisms to large language model inference

AI Hardware
7 min read

Validation & Measurement — Trust, But Verify

expert

Cross-validate models with real counters, quantify uncertainty, and communicate limits in performance analysis

Performance130m
4 exercises
5 tools
4 applications
#validation#measurement#perf#eBPF
2 min read