Interactive Learning

Questions

Master computer architecture concepts with interactive flashcards. Click any card to reveal the detailed answer and learning materials.

Total Questions

Actually Asked

Showing

Banked L1 Design

seniorCPU

What are the design tradeoffs for banked L1 caches in multi-core processors?

12-15 minutes

Click to open

cachebankingmemory-hierarchy+1

bfloat16 vs float16 Precision Formats

midAIAccelerators

What are the key differences between bfloat16 and float16, and when would you choose each format?

8-12 minutes

Click to open

bfloat16float16FP16+5

bfloat16 vs float16 Precision Formats

midAIAccelerators

What are the key differences between bfloat16 and float16, and when would you choose each format?

8-12 minutes

Click to open

bfloat16float16FP16+5

Coalescing vs Sparsity Interaction

seniorGPUAsked

Coalescing vs sparsity—what's the interaction?

12-15 minutes

Click to open

GPUsparsitycoalescing+3

Convolution Types: Regular vs Depthwise vs Pointwise

seniorAIAccelerators

Compare regular, depthwise, and pointwise convolutions. What are their computational complexities and use cases?

10-15 minutes

Click to open

convolutiondepthwisepointwise+2

RAII (Resource Acquisition Is Initialization)

midCPU

What is RAII and why is it valuable in simulators and systems programming?

8-12 minutes

Click to open

cppraiimemory-management+2

C++20 Coroutines: Promise Type and Awaitable Interface

seniorParallel

Outline the key parts of a C++20 coroutine: promise_type, initial_suspend, final_suspend, and an awaitable's interface.

15-20 minutes

Click to open

cpp20coroutinesasync+2

Order-Preserving Distributed Rasterizer

midGPUAsked

ELI5 — 'order‑preserving distributed rasterizer' + 'primitive distribution fabric'

8-10 minutes

Click to open

GPUrasterizationorder-preserving+2

GPU paging vs CPU paging

midGPUAsked

Explain GPU paging vs CPU paging?

8-12 minutes

Click to open

GPUCPUpaging+2

GPU vs CPU Architecture Differences

midGPUAsked

Why are GPUs so much faster than CPUs for parallel workloads? What are the fundamental architectural differences?

10-15 minutes

Click to open

GPUCPUparallel+3

Why GPUs use warps instead of single-thread execution?

midGPUAsked

Why do GPUs use warps/wavefronts instead of single-thread execution like CPUs?

8-10 minutes

Click to open

GPUwarpsSIMT+3

Hardware-Managed Virtual Buffer Design

seniorGPUAsked

Design a hardware‑managed virtual buffer for per‑primitive attributes. How do you allocate/free entries, and keep it correct when multiple raster units consume the same primitive?

15-20 minutes

Click to open

GPUgraphicshardware-design+2

Independent Thread Scheduling and Divergence

seniorGPUAsked

How does independent thread scheduling allow better handling divergence and improving utilization balance?

12-15 minutes

Click to open

GPUthread-schedulingdivergence+2

GPU Memory Coalescing Explanation

midGPUAsked

Explain memory coalescing in GPUs and why it matters.

10-12 minutes

Click to open

GPUmemory-coalescingbandwidth+2

Explain MESI Cache Coherence States

midCPUAsked

Can you walk me through the four states in the MESI cache coherence protocol and when transitions occur between them?

10-12 minutes

Click to open

MESIcachecoherence+1

Modeling Accuracy vs Speed Trade-off

seniorPerformanceAsked

Modeling accuracy vs speed trade‑off

10-15 minutes

Click to open

simulationmodelingperformance+2

MXFP Format Benefits

midAIAcceleratorsAsked

What's the benefit of MXFP format?

8-10 minutes

Click to open

MXFPfloating-pointML+3

NPU Architecture: Compute-Dense vs Memory-Bound Operations

seniorAIAccelerators

How do modern NPU architectures handle the different computational characteristics of neural network operations?

12-15 minutes

Click to open

npu-architecturecompute-densitymemory-bandwidth+2

NPU Convolution Performance Analysis with Worked Example

seniorAIAccelerators

Walk through a detailed performance analysis comparing regular vs depthwise separable convolution on NPU hardware.

15-20 minutes

Click to open

npuperformance-analysisconvolution+2

Data Layout Conversion Algorithm (NCHW ↔ NHWC)

midAIAccelerators

How do you implement conversion between NCHW and NHWC data layouts? Show the algorithm and index mapping.

8-10 minutes

Click to open

npudata-layoutalgorithm+2

NCHW vs NHWC Data Layouts in Neural Processing

seniorAIAccelerators

What are NCHW and NHWC data layouts, and why do both exist in neural processing?

8-12 minutes

Click to open

npudata-layoutnchw+2

NPU Performance Analysis Formulas and Quick Reference

midAIAccelerators

What are the essential formulas for NPU performance analysis covering TOPS/TMAC conversions, convolution complexities, and data layouts?

5-8 minutes

Click to open

npuformulasreference+2

TOPS to TMAC/s Conversion in NPUs

midAIAccelerators

How do you convert between TOPS and TMAC/s for NPU performance metrics?

5-8 minutes

Click to open

npuperformance-metricstops+2

How does paging affect performance modeling?

midCPUAsked

How does paging affect performance modeling?

10-12 minutes

Click to open

pagingTLBperformance+2

GPU Shared Memory Bank Conflicts

midGPUAsked

How do you avoid bank conflicts in shared memory?

8-10 minutes

Click to open

GPUshared-memorybank-conflicts+2

How does SystemC kernel work?

seniorCPUAsked

How does SystemC kernel work?

12-15 minutes

Click to open

systemckernelscheduling+2

SystemC Scheduling: Why Stale Values?

midCPUAsked

SystemC scheduling confusion: why did my reader see a stale value?

8-10 minutes

Click to open

systemcdebuggingsignals+2

SystemC Process Types: SC_THREAD vs SC_METHOD

midCPU

What is the key difference between SC_THREAD and SC_METHOD regarding wait() and execution model?

8-10 minutes

Click to open

systemcprocessessimulation+1

Tensor Parallelism in Large Language Models

advancedAI SystemsAsked

Explain tensor parallelism and how it enables scaling large language models across multiple devices. How does it work in attention and MLP layers?

15-20 minutes

Click to open

tensor-parallelismLLMdistributed+3

TLB Role in High-Performance Systems

midCPUAsked

What's the role of TLB in high-performance systems?

8-12 minutes

Click to open

TLBaddress-translationperformance+1

TLM Temporal Decoupling and Quantum Keeper

seniorCPU

Explain temporal decoupling in TLM and how tlm_quantumkeeper reduces scheduler overhead.

12-15 minutes

Click to open

tlmperformancetemporal-decoupling+2

Virtual Channels in Interconnect Fabric

seniorNoCAsked

What are virtual channels in an interconnect fabric, and why do they help avoid deadlock?

12-15 minutes

Click to open

virtual-channelsNoCdeadlock+2

Round-Robin vs Greedy-Then-Oldest Warp Scheduling

seniorGPUAsked

Compare Round-Robin vs Greedy-Then-Oldest (GTO) warp scheduling policies.

12-15 minutes

Click to open

GPUwarp-schedulinground-robin+3

Work Distribution Crossbar: Fairness vs Throughput

midGPUAsked

ELI5 — Work‑distribution crossbar, fairness vs. throughput

10-12 minutes

Click to open

GPUcrossbarscheduling+3

1 / 34

←→Navigate

EnterOpen

Study Tips

Click any card to open the full-screen learning mode. Use keyboard shortcuts: Space/Enter to reveal answers, arrow keys to navigate, and Escape to close. Focus on understanding the concepts rather than memorizing.