ModulesPerformance
Validation & Measurement — Trust, But Verify
Cross-validate models with real counters, quantify uncertainty, and communicate limits in performance analysis
expertPerformance130m
4
Exercises
5
Tools
4
Applications
2
Min Read
Practical Exercises
- Hardware counter correlation with simulator output
- eBPF-based system bottleneck identification
- Performance model validation methodology
- Statistical confidence interval calculation
Tools Required
perfeBPF/bpftraceVTuneHardware countersStatistical analysis tools
Real-World Applications
- Performance model validation for new processors
- Production system bottleneck diagnosis
- Benchmark result verification
- Hardware counter-based optimization
Part of Learning Tracks
Validation & Measurement — Trust, But Verify
Goal: Cross‑validate models with real counters, quantify uncertainty, and communicate limits.
📋 Table of Contents
1) The triangle check
2) Linux perf / eBPF / friends
3) Communicating limits
4) Quick validations
References
1) The triangle check
Simulator ↔ hardware counters/profilers ↔ application‑level KPIs. If two sides disagree, investigate before drawing conclusions.
2) Linux perf / eBPF / friends
perf stat
for macro counters;perf record/report
for hotspots (annotate).- eBPF with bcc/bpftrace to trace syscalls, scheduler, TCP/IO; build USE (Utilization, Saturation, Errors) dashboards.
- Pin ROIs with markers (usdt/user probes) to align with simulator ROIs.
Counter set (starter):
- cycles, instructions, branches, branch-misses, cache-references, cache-misses.
- L1D/L2/LLC misses & refills; dtlb/itlb loads & misses; page walks.
- Offcore responses: local vs. remote DRAM, prefetch vs. demand.
- Memory BW (uncore counters), throttling/thermal flags.
Examples
# macro view
perf stat -e cycles,instructions,branches,branch-misses,cache-misses ./app --args
# profile FE stalls or branchy code
perf record -e branch-misses:i -g -- ./app --args
perf report --stdio
# eBPF TCP latency histogram (bpftrace)
bpftrace -e 'kprobe/tcp_recvmsg { @ns[tid] = nsecs; } kretprobe/tcp_recvmsg /@ns[tid]/ { @d = hist(nsecs - @ns[tid]); delete(@ns[tid]); }'
Correlate with MPKI, miss latency, measured BW (e.g., pcm-memory
, perf uncore_imc/*/
), and Top‑Down percentages.
3) Communicating limits
- State what the model cannot see (firmware throttling, PCIe back‑pressure, OS jitter).
- Separate calibrated from extrapolated claims.
- Provide error bars and rerun counts; show before/after with the same inputs and placement.
4) Quick validations
- Memory‑bound? MPKI↑ + miss latency↑ + BW near roofline ⇒ yes.
- Compute‑bound? Port pressure/issue utilization high + close to compute roof.
- Storage‑bound?
fio
with same queue depths & sizes; check p99 not just mean.
References
- Brendan Gregg's USE method; Linux perf manpages; eBPF tutorials.
Related Modules
#validation#measurement#perf#eBPF#hardware-counters#verification