ModulesMLSystems
Benchmarks & Workloads — MLPerf Essentials
What MLPerf Inference/Training measure, how to read QPS/latency/accuracy, and pragmatic usage for architecture evaluation
intermediateMLSystems75m
4
Exercises
3
Tools
4
Applications
2
Min Read
Practical Exercises
- MLPerf result analysis and comparison
- Custom workload integration with MLPerf
- SLO tier mapping for production systems
- Cost-per-token calculation methodology
Tools Required
MLPerf benchmarksLoadGenPerformance analysis tools
Real-World Applications
- Hardware procurement decision support
- Competitive analysis of AI accelerators
- Capacity planning for inference workloads
- Performance regression testing
Part of Learning Tracks
Benchmarks & Workloads — MLPerf Essentials
Focus: What MLPerf Inference/Training measure, how to read QPS/latency/accuracy, and how to use it pragmatically.
📋 Table of Contents
1) Taxonomy
- Inference: Datacenter (Server, Offline), Edge (SingleStream, MultiStream). Metrics: QPS at/below latency targets and at/above accuracy targets.
- Training: Time‑to‑target‑quality (e.g., min time to reach specified validation metric).
Closed vs. Open: closed enforces model/accuracy/inputs; open allows alternative optimizations (less comparable).
2) Reading a result like a pro
- Confirm accuracy equivalence (e.g., 99% vs. 99.9% of reference).
- Compare Server (latency‑bounded QPS) vs. Offline (throughput) within the same submission.
- Check power runs (energy/inf), and configuration details: quantization recipe, graph capture, KV reuse, MIG layout.
- Always look for SLO tiers analogous to your production tiers.
3) Internal usage
- Use as a sanity suite to guard regressions.
- Augment with proprietary workloads (real tokenizer, pre/post).
- Track $ per 1k tokens for LLMs; feed into capacity planning.
4) Communication patterns
Provide a one‑pager per SKU:
- QPS @ latency target, Energy/op, $ per 1k tokens.
- Footnote accuracy, batching mode, precision (FP8/FP16/INT8), and KV policies.
References
- MLPerf Inference and Training documentation portals; latest public result pages.
Related Modules
#MLPerf#benchmarks#inference#training#QPS#latency#accuracy