Skip to main content
ModulesPerformance

Validation & Measurement — Trust, But Verify

Cross-validate models with real counters, quantify uncertainty, and communicate limits in performance analysis

expertPerformance130m
4
Exercises
5
Tools
4
Applications
2
Min Read

Practical Exercises

  • Hardware counter correlation with simulator output
  • eBPF-based system bottleneck identification
  • Performance model validation methodology
  • Statistical confidence interval calculation

Tools Required

perfeBPF/bpftraceVTuneHardware countersStatistical analysis tools

Real-World Applications

  • Performance model validation for new processors
  • Production system bottleneck diagnosis
  • Benchmark result verification
  • Hardware counter-based optimization

Validation & Measurement — Trust, But Verify

Goal: Cross‑validate models with real counters, quantify uncertainty, and communicate limits.


📋 Table of Contents


1) The triangle check

Simulator ↔ hardware counters/profilers ↔ application‑level KPIs. If two sides disagree, investigate before drawing conclusions.


2) Linux perf / eBPF / friends

  • perf stat for macro counters; perf record/report for hotspots (annotate).
  • eBPF with bcc/bpftrace to trace syscalls, scheduler, TCP/IO; build USE (Utilization, Saturation, Errors) dashboards.
  • Pin ROIs with markers (usdt/user probes) to align with simulator ROIs.

Counter set (starter):

  • cycles, instructions, branches, branch-misses, cache-references, cache-misses.
  • L1D/L2/LLC misses & refills; dtlb/itlb loads & misses; page walks.
  • Offcore responses: local vs. remote DRAM, prefetch vs. demand.
  • Memory BW (uncore counters), throttling/thermal flags.

Examples

# macro view
perf stat -e cycles,instructions,branches,branch-misses,cache-misses ./app --args
 
# profile FE stalls or branchy code
perf record -e branch-misses:i -g -- ./app --args
perf report --stdio
 
# eBPF TCP latency histogram (bpftrace)
bpftrace -e 'kprobe/tcp_recvmsg { @ns[tid] = nsecs; } kretprobe/tcp_recvmsg /@ns[tid]/ { @d = hist(nsecs - @ns[tid]); delete(@ns[tid]); }'

Correlate with MPKI, miss latency, measured BW (e.g., pcm-memory, perf uncore_imc/*/), and Top‑Down percentages.


3) Communicating limits

  • State what the model cannot see (firmware throttling, PCIe back‑pressure, OS jitter).
  • Separate calibrated from extrapolated claims.
  • Provide error bars and rerun counts; show before/after with the same inputs and placement.

4) Quick validations

  • Memory‑bound? MPKI↑ + miss latency↑ + BW near roofline ⇒ yes.
  • Compute‑bound? Port pressure/issue utilization high + close to compute roof.
  • Storage‑bound? fio with same queue depths & sizes; check p99 not just mean.

References

  • Brendan Gregg's USE method; Linux perf manpages; eBPF tutorials.
#validation#measurement#perf#eBPF#hardware-counters#verification