Skip to main content
ModulesMLSystems

Benchmarks & Workloads — MLPerf Essentials

What MLPerf Inference/Training measure, how to read QPS/latency/accuracy, and pragmatic usage for architecture evaluation

intermediateMLSystems75m
4
Exercises
3
Tools
4
Applications
2
Min Read

Practical Exercises

  • MLPerf result analysis and comparison
  • Custom workload integration with MLPerf
  • SLO tier mapping for production systems
  • Cost-per-token calculation methodology

Tools Required

MLPerf benchmarksLoadGenPerformance analysis tools

Real-World Applications

  • Hardware procurement decision support
  • Competitive analysis of AI accelerators
  • Capacity planning for inference workloads
  • Performance regression testing

Benchmarks & Workloads — MLPerf Essentials

Focus: What MLPerf Inference/Training measure, how to read QPS/latency/accuracy, and how to use it pragmatically.


📋 Table of Contents


1) Taxonomy

  • Inference: Datacenter (Server, Offline), Edge (SingleStream, MultiStream). Metrics: QPS at/below latency targets and at/above accuracy targets.
  • Training: Time‑to‑target‑quality (e.g., min time to reach specified validation metric).

Closed vs. Open: closed enforces model/accuracy/inputs; open allows alternative optimizations (less comparable).


2) Reading a result like a pro

  • Confirm accuracy equivalence (e.g., 99% vs. 99.9% of reference).
  • Compare Server (latency‑bounded QPS) vs. Offline (throughput) within the same submission.
  • Check power runs (energy/inf), and configuration details: quantization recipe, graph capture, KV reuse, MIG layout.
  • Always look for SLO tiers analogous to your production tiers.

3) Internal usage

  • Use as a sanity suite to guard regressions.
  • Augment with proprietary workloads (real tokenizer, pre/post).
  • Track $ per 1k tokens for LLMs; feed into capacity planning.

4) Communication patterns

Provide a one‑pager per SKU:

  • QPS @ latency target, Energy/op, $ per 1k tokens.
  • Footnote accuracy, batching mode, precision (FP8/FP16/INT8), and KV policies.

References

  • MLPerf Inference and Training documentation portals; latest public result pages.
#MLPerf#benchmarks#inference#training#QPS#latency#accuracy