Questions
Master computer architecture concepts with interactive flashcards. Click any card to reveal the detailed answer and learning materials.
Banked L1 Design
What are the design tradeoffs for banked L1 caches in multi-core processors?
bfloat16 vs float16 Precision Formats
What are the key differences between bfloat16 and float16, and when would you choose each format?
bfloat16 vs float16 Precision Formats
What are the key differences between bfloat16 and float16, and when would you choose each format?
Coalescing vs Sparsity Interaction
Coalescing vs sparsity—what's the interaction?
Convolution Types: Regular vs Depthwise vs Pointwise
Compare regular, depthwise, and pointwise convolutions. What are their computational complexities and use cases?
RAII (Resource Acquisition Is Initialization)
What is RAII and why is it valuable in simulators and systems programming?
C++20 Coroutines: Promise Type and Awaitable Interface
Outline the key parts of a C++20 coroutine: promise_type, initial_suspend, final_suspend, and an awaitable's interface.
Order-Preserving Distributed Rasterizer
ELI5 — 'order‑preserving distributed rasterizer' + 'primitive distribution fabric'
GPU paging vs CPU paging
Explain GPU paging vs CPU paging?
GPU vs CPU Architecture Differences
Why are GPUs so much faster than CPUs for parallel workloads? What are the fundamental architectural differences?
Why GPUs use warps instead of single-thread execution?
Why do GPUs use warps/wavefronts instead of single-thread execution like CPUs?
Hardware-Managed Virtual Buffer Design
Design a hardware‑managed virtual buffer for per‑primitive attributes. How do you allocate/free entries, and keep it correct when multiple raster units consume the same primitive?
Independent Thread Scheduling and Divergence
How does independent thread scheduling allow better handling divergence and improving utilization balance?
GPU Memory Coalescing Explanation
Explain memory coalescing in GPUs and why it matters.
Explain MESI Cache Coherence States
Can you walk me through the four states in the MESI cache coherence protocol and when transitions occur between them?
Modeling Accuracy vs Speed Trade-off
Modeling accuracy vs speed trade‑off
MXFP Format Benefits
What's the benefit of MXFP format?
NPU Architecture: Compute-Dense vs Memory-Bound Operations
How do modern NPU architectures handle the different computational characteristics of neural network operations?
NPU Convolution Performance Analysis with Worked Example
Walk through a detailed performance analysis comparing regular vs depthwise separable convolution on NPU hardware.
Data Layout Conversion Algorithm (NCHW ↔ NHWC)
How do you implement conversion between NCHW and NHWC data layouts? Show the algorithm and index mapping.
NCHW vs NHWC Data Layouts in Neural Processing
What are NCHW and NHWC data layouts, and why do both exist in neural processing?
NPU Performance Analysis Formulas and Quick Reference
What are the essential formulas for NPU performance analysis covering TOPS/TMAC conversions, convolution complexities, and data layouts?
TOPS to TMAC/s Conversion in NPUs
How do you convert between TOPS and TMAC/s for NPU performance metrics?
How does paging affect performance modeling?
How does paging affect performance modeling?
GPU Shared Memory Bank Conflicts
How do you avoid bank conflicts in shared memory?
How does SystemC kernel work?
How does SystemC kernel work?
SystemC Scheduling: Why Stale Values?
SystemC scheduling confusion: why did my reader see a stale value?
SystemC Process Types: SC_THREAD vs SC_METHOD
What is the key difference between SC_THREAD and SC_METHOD regarding wait() and execution model?
Tensor Parallelism in Large Language Models
Explain tensor parallelism and how it enables scaling large language models across multiple devices. How does it work in attention and MLP layers?
TLB Role in High-Performance Systems
What's the role of TLB in high-performance systems?
TLM Temporal Decoupling and Quantum Keeper
Explain temporal decoupling in TLM and how tlm_quantumkeeper reduces scheduler overhead.
Virtual Channels in Interconnect Fabric
What are virtual channels in an interconnect fabric, and why do they help avoid deadlock?
Round-Robin vs Greedy-Then-Oldest Warp Scheduling
Compare Round-Robin vs Greedy-Then-Oldest (GTO) warp scheduling policies.
Work Distribution Crossbar: Fairness vs Throughput
ELI5 — Work‑distribution crossbar, fairness vs. throughput
Study Tips
Click any card to open the full-screen learning mode. Use keyboard shortcuts: Space/Enter to reveal answers, arrow keys to navigate, and Escape to close. Focus on understanding the concepts rather than memorizing.