Computer Architecture Fundamentals: ISA to Pipeline
Essential foundations of computer architecture from instruction sets to pipelined execution, covering ISA design, the classic 5-stage pipeline, and hazard management.
Prerequisites
Make sure you're familiar with these concepts before diving in:
Learning Objectives
By the end of this topic, you will be able to:
Table of Contents
Computer Architecture Fundamentals: ISA to Pipeline
Welcome to the foundation of modern computing! Understanding how processors execute instructions is crucial for anyone working in computer architecture, systems programming, or hardware design. Let's build your mental model from the ground up! ποΈ
1. What is an Instruction Set Architecture (ISA)?
The ISA is the contract between software and hardware - it defines:
1.1 Core ISA Components
1. Instructions: Operations the processor can perform
ADD r1, r2, r3 # r1 = r2 + r3
LOAD r1, 0(r2) # r1 = Memory[r2]
BEQ r1, r2, label # if r1 == r2, goto label
2. Registers: Fast storage directly accessible by instructions
- General-purpose registers (x0-x31 in RISC-V)
- Special registers (PC, stack pointer, status flags)
3. Memory Model: How instructions access data
- Load/Store architecture (RISC)
- Memory-to-memory operations (CISC)
4. Data Types: What kinds of data are supported
- Integers (8, 16, 32, 64-bit)
- Floating-point numbers
- SIMD vector types
1.2 Why ISA Matters
Portability: Software compiled for an ISA runs on any compatible processor
Performance: ISA design affects:
- Code density (instructions per program)
- Decode complexity
- Pipeline efficiency
- Power consumption
Examples of Popular ISAs:
- x86-64: Desktop/server (Intel, AMD)
- ARM: Mobile, embedded, Apple Silicon
- RISC-V: Open-source, research, emerging products
- MIPS: Education, embedded systems
2. The Classic 5-Stage Pipeline
Pipelining is like an assembly line for instructions - while one instruction is being decoded, another is being fetched, and yet another is executing!
2.1 The Five Stages
βββββββββββ βββββββββββ βββββββββββ βββββββββββ ββββββββββββ
β FETCH β β β DECODE β β β EXECUTE β β β MEMORY β β β WRITEBACKβ
β (IF) β β (ID) β β (EX) β β (MEM) β β (WB) β
βββββββββββ βββββββββββ βββββββββββ βββββββββββ ββββββββββββ
1. Instruction Fetch (IF)
- Read next instruction from memory using Program Counter (PC)
- Increment PC (PC = PC + 4 for 32-bit instructions)
- Place instruction in IF/ID pipeline register
PC β I-Cache β Instruction β IF/ID Register
2. Instruction Decode (ID)
- Decode the instruction opcode
- Read source registers from register file
- Sign-extend immediate values
- Pass control signals to next stages
// Example: ADD r1, r2, r3
Opcode: ADD
Source1: Read r2
Source2: Read r3
Dest: r1
3. Execute (EX)
- Perform ALU operation
- Calculate memory address (for loads/stores)
- Evaluate branch condition
- Forward results if needed
// For ADD: result = rs1_value + rs2_value
// For LOAD: address = rs1_value + immediate
// For BEQ: compare rs1_value with rs2_value
4. Memory Access (MEM)
- Load data from memory (if LOAD instruction)
- Store data to memory (if STORE instruction)
- Pass through ALU result for other instructions
// LOAD: data = Memory[address]
// STORE: Memory[address] = data
// Other: pass-through from EX stage
5. Write Back (WB)
- Write result to destination register
- Update register file
- Instruction completes!
// Write result to rd (destination register)
if (instruction_writes_register) {
RegisterFile[rd] = result;
}
2.2 Pipeline in Action: Example Execution
Let's trace 5 instructions through the pipeline:
I1: LOAD r1, 0(r2) # r1 = Memory[r2]
I2: ADD r3, r1, r4 # r3 = r1 + r4
I3: SUB r5, r3, r6 # r5 = r3 - r6
I4: AND r7, r5, r8 # r7 = r5 & r8
I5: STORE r7, 0(r9) # Memory[r9] = r7
Pipeline Diagram:
Cycle: 1 2 3 4 5 6 7 8 9
I1: IF ID EX MEM WB
I2: IF ID EX MEM WB
I3: IF ID EX MEM WB
I4: IF ID EX MEM WB
I5: IF ID EX MEM WB
Throughput: One instruction completes per cycle (after initial fill)!
3. Pipeline Hazards: When Things Go Wrong
Hazards are situations that prevent the next instruction from executing in its designated clock cycle.
3.1 1. Structural Hazards βοΈ
Cause: Hardware resource conflict - two instructions need the same resource simultaneously.
Example: Single memory port
I1: LOAD r1, 0(r2) # Needs memory in MEM stage
I4: LOAD r5, 0(r6) # Needs memory in IF stage β CONFLICT!
Solution:
- Separate instruction and data caches (Harvard architecture)
- Duplicate resources (multiple ALUs)
- Stall one instruction until resource is free
3.2 2. Data Hazards π
Cause: Instruction depends on result of previous instruction still in pipeline.
Read-After-Write (RAW) - True Dependency
The most common hazard!
I1: ADD r1, r2, r3 # r1 = r2 + r3 (writes r1)
I2: SUB r4, r1, r5 # r4 = r1 - r5 (reads r1) β NEEDS r1!
Problem: I2 reads r1 in ID stage (cycle 2), but I1 writes r1 in WB stage (cycle 5)!
Cycle: 1 2 3 4 5
I1: IF ID EX MEM WB β r1 written here
I2: IF ID ... β r1 needed here β TOO EARLY!
β
Hazard!
Solutions for RAW Hazards
A. Forwarding (Bypassing) π
Send result directly from EX or MEM stage to the next instruction's EX stage.
Cycle: 1 2 3 4 5
I1: IF ID EX MEM WB
β Forward!
I2: IF ID EX MEM WB
β Use forwarded value
Forwarding Paths:
// EX/MEM β EX forwarding
if (EX_MEM.rd == ID_EX.rs1) {
ALU_input1 = EX_MEM.ALUResult;
}
// MEM/WB β EX forwarding
if (MEM_WB.rd == ID_EX.rs1) {
ALU_input1 = MEM_WB.Result;
}
B. Stalling (Pipeline Bubble) π
When forwarding isn't enough (e.g., load-use hazard):
I1: LOAD r1, 0(r2) # r1 available after MEM stage
I2: ADD r3, r1, r4 # Needs r1 immediately β STALL!
Cycle: 1 2 3 4 5 6
I1: IF ID EX MEM WB
I2: IF ID stall EX MEM WB
β
Insert bubble (NOP)
Cost: One lost cycle per stall β±οΈ
3.3 3. Control Hazards π
Cause: Branch instruction changes program flow - we don't know which instruction to fetch next!
BEQ r1, r2, target # if r1 == r2, goto target
ADD r3, r4, r5 # Next sequential instruction
SUB r6, r7, r8
target: MUL r9, r10, r11 # Branch target
Problem: Branch outcome known in EX stage (cycle 3), but we're already fetching from cycle 2!
Solutions for Control Hazards
A. Stall Until Branch Resolves
Cycle: 1 2 3 4 5
BEQ: IF ID EX MEM WB
Next: IF stall stall IF β Wait for branch decision
Cost: 2 cycles per branch π±
B. Branch Prediction π―
- Predict branch will be taken/not taken
- Speculatively fetch from predicted path
- If wrong, flush pipeline and restart
Simple Static Prediction:
- Always predict "not taken" (continue sequential)
- Always predict "taken" for backwards branches (loops)
Dynamic Prediction (more advanced):
- Branch history table (BHT)
- Two-bit saturating counter
- 90%+ accuracy for many workloads!
C. Delayed Branch Slots
- MIPS/SPARC approach: instruction after branch always executes
- Compiler fills slot with useful work or NOP
BEQ r1, r2, target
ADD r3, r4, r5 # Delay slot - always executes!
4. Why Pipelining Improves Throughput
4.1 Without Pipelining (Sequential Execution)
Time: 0 5 10 15 20 25 30 35 40
I1: [IF|ID|EX|MEM|WB]
I2: [IF|ID|EX|MEM|WB]
I3: [IF|ID|EX|MEM|WB]
Throughput: 1 instruction per 5 cycles = 0.2 IPC
4.2 With Pipelining (Overlapped Execution)
Time: 0 1 2 3 4 5 6 7 8
I1: IF ID EX MEM WB
I2: IF ID EX MEM WB
I3: IF ID EX MEM WB
I4: IF ID EX MEM WB
I5: IF ID EX MEM WB
Throughput: 1 instruction per cycle = 1.0 IPC (5x speedup!)
4.3 Speedup Formula
Ideal case (no hazards):
Speedup = Number of pipeline stages
= 5 (for 5-stage pipeline)
Real-world (with hazards):
CPI = 1 + (stall cycles per instruction)
If 20% instructions are loads with 1 stall each:
CPI = 1 + 0.2 Γ 1 = 1.2
Speedup = 5 / 1.2 = 4.17x
5. Key Takeaways
β ISA defines the hardware-software contract - instructions, registers, memory model
β 5-stage pipeline overlaps instruction execution for higher throughput
- IF β ID β EX β MEM β WB
β Three types of hazards:
- Structural: Resource conflicts
- Data: Dependencies between instructions
- Control: Branches change flow
β Hazard solutions:
- Forwarding: Send results early
- Stalling: Insert pipeline bubbles
- Prediction: Guess branch outcomes
β Pipelining improves throughput (instructions per cycle) without reducing latency (cycles per instruction)
6. Practice Problems
-
Calculate IPC: A 5-stage pipeline with 30% branch instructions (2 stall cycles each) and 25% loads (1 stall cycle each). What is the effective CPI?
-
Identify Hazards:
ADD r1, r2, r3
SUB r4, r1, r5
AND r6, r4, r7
OR r8, r6, r9
Where do you need forwarding?
- Forwarding Logic: When can you forward from EX/MEM vs MEM/WB stage? When must you stall?
7. Next Steps
Now that you understand pipeline fundamentals:
β Cache Coherence: How do multiple cores keep their caches consistent?
β Advanced Pipelines: Superscalar, out-of-order execution, branch prediction
β GPU Architecture: Massive parallelism through thousands of simple pipelines
β Performance Analysis: Measure and optimize IPC in real systems
Remember: Every modern processor - from your smartphone to datacenter servers - builds on these fundamental concepts! π