Skip to main content
CPUbeginnerISApipelinehazardsforwardingCPU-fundamentalsbeginner

Computer Architecture Fundamentals: ISA to Pipeline

Essential foundations of computer architecture from instruction sets to pipelined execution, covering ISA design, the classic 5-stage pipeline, and hazard management.

30 min read
Updated 10/1/2025
3 prerequisites

Prerequisites

Make sure you're familiar with these concepts before diving in:

Basic programming knowledge
Understanding of binary numbers
Familiarity with computer organization concepts

Learning Objectives

By the end of this topic, you will be able to:

Understand what an Instruction Set Architecture (ISA) is and why it matters
Master the classic 5-stage pipeline and its operation
Identify and resolve structural, data, and control hazards
Apply forwarding and stall mechanisms to handle dependencies
Explain why pipelining improves processor throughput

Table of Contents

Computer Architecture Fundamentals: ISA to Pipeline

Welcome to the foundation of modern computing! Understanding how processors execute instructions is crucial for anyone working in computer architecture, systems programming, or hardware design. Let's build your mental model from the ground up! πŸ—οΈ

1. What is an Instruction Set Architecture (ISA)?

The ISA is the contract between software and hardware - it defines:

1.1 Core ISA Components

1. Instructions: Operations the processor can perform

ADD  r1, r2, r3    # r1 = r2 + r3
LOAD r1, 0(r2)     # r1 = Memory[r2]
BEQ  r1, r2, label # if r1 == r2, goto label

2. Registers: Fast storage directly accessible by instructions

  • General-purpose registers (x0-x31 in RISC-V)
  • Special registers (PC, stack pointer, status flags)

3. Memory Model: How instructions access data

  • Load/Store architecture (RISC)
  • Memory-to-memory operations (CISC)

4. Data Types: What kinds of data are supported

  • Integers (8, 16, 32, 64-bit)
  • Floating-point numbers
  • SIMD vector types

1.2 Why ISA Matters

Portability: Software compiled for an ISA runs on any compatible processor

Performance: ISA design affects:

  • Code density (instructions per program)
  • Decode complexity
  • Pipeline efficiency
  • Power consumption

Examples of Popular ISAs:

  • x86-64: Desktop/server (Intel, AMD)
  • ARM: Mobile, embedded, Apple Silicon
  • RISC-V: Open-source, research, emerging products
  • MIPS: Education, embedded systems

2. The Classic 5-Stage Pipeline

Pipelining is like an assembly line for instructions - while one instruction is being decoded, another is being fetched, and yet another is executing!

2.1 The Five Stages

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  FETCH  β”‚ β†’ β”‚ DECODE  β”‚ β†’ β”‚ EXECUTE β”‚ β†’ β”‚ MEMORY  β”‚ β†’ β”‚ WRITEBACKβ”‚
β”‚   (IF)  β”‚   β”‚  (ID)   β”‚   β”‚  (EX)   β”‚   β”‚  (MEM)  β”‚   β”‚   (WB)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1. Instruction Fetch (IF)

  • Read next instruction from memory using Program Counter (PC)
  • Increment PC (PC = PC + 4 for 32-bit instructions)
  • Place instruction in IF/ID pipeline register
PC β†’ I-Cache β†’ Instruction β†’ IF/ID Register

2. Instruction Decode (ID)

  • Decode the instruction opcode
  • Read source registers from register file
  • Sign-extend immediate values
  • Pass control signals to next stages
// Example: ADD r1, r2, r3
Opcode:    ADD
Source1:   Read r2
Source2:   Read r3  
Dest:      r1

3. Execute (EX)

  • Perform ALU operation
  • Calculate memory address (for loads/stores)
  • Evaluate branch condition
  • Forward results if needed
// For ADD: result = rs1_value + rs2_value
// For LOAD: address = rs1_value + immediate
// For BEQ: compare rs1_value with rs2_value

4. Memory Access (MEM)

  • Load data from memory (if LOAD instruction)
  • Store data to memory (if STORE instruction)
  • Pass through ALU result for other instructions
// LOAD:  data = Memory[address]
// STORE: Memory[address] = data
// Other: pass-through from EX stage

5. Write Back (WB)

  • Write result to destination register
  • Update register file
  • Instruction completes!
// Write result to rd (destination register)
if (instruction_writes_register) {
    RegisterFile[rd] = result;
}

2.2 Pipeline in Action: Example Execution

Let's trace 5 instructions through the pipeline:

I1: LOAD r1, 0(r2)     # r1 = Memory[r2]
I2: ADD  r3, r1, r4    # r3 = r1 + r4
I3: SUB  r5, r3, r6    # r5 = r3 - r6  
I4: AND  r7, r5, r8    # r7 = r5 & r8
I5: STORE r7, 0(r9)    # Memory[r9] = r7

Pipeline Diagram:

Cycle: 1    2    3    4    5    6    7    8    9
I1:    IF   ID   EX   MEM  WB
I2:         IF   ID   EX   MEM  WB
I3:              IF   ID   EX   MEM  WB
I4:                   IF   ID   EX   MEM  WB
I5:                        IF   ID   EX   MEM  WB

Throughput: One instruction completes per cycle (after initial fill)!

3. Pipeline Hazards: When Things Go Wrong

Hazards are situations that prevent the next instruction from executing in its designated clock cycle.

3.1 1. Structural Hazards βš™οΈ

Cause: Hardware resource conflict - two instructions need the same resource simultaneously.

Example: Single memory port

I1: LOAD r1, 0(r2)  # Needs memory in MEM stage
I4: LOAD r5, 0(r6)  # Needs memory in IF stage β†’ CONFLICT!

Solution:

  • Separate instruction and data caches (Harvard architecture)
  • Duplicate resources (multiple ALUs)
  • Stall one instruction until resource is free

3.2 2. Data Hazards πŸ“Š

Cause: Instruction depends on result of previous instruction still in pipeline.

Read-After-Write (RAW) - True Dependency

The most common hazard!

I1: ADD r1, r2, r3    # r1 = r2 + r3  (writes r1)
I2: SUB r4, r1, r5    # r4 = r1 - r5  (reads r1) β†’ NEEDS r1!

Problem: I2 reads r1 in ID stage (cycle 2), but I1 writes r1 in WB stage (cycle 5)!

Cycle: 1    2    3    4    5
I1:    IF   ID   EX   MEM  WB  ← r1 written here
I2:         IF   ID   ...       ← r1 needed here β†’ TOO EARLY!
                ↑
              Hazard!

Solutions for RAW Hazards

A. Forwarding (Bypassing) πŸ”„

Send result directly from EX or MEM stage to the next instruction's EX stage.

Cycle: 1    2    3    4    5
I1:    IF   ID   EX   MEM  WB
                  ↓ Forward!
I2:         IF   ID   EX   MEM  WB
                      ↑ Use forwarded value

Forwarding Paths:

// EX/MEM β†’ EX forwarding
if (EX_MEM.rd == ID_EX.rs1) {
    ALU_input1 = EX_MEM.ALUResult;
}
 
// MEM/WB β†’ EX forwarding  
if (MEM_WB.rd == ID_EX.rs1) {
    ALU_input1 = MEM_WB.Result;
}

B. Stalling (Pipeline Bubble) πŸ’­

When forwarding isn't enough (e.g., load-use hazard):

I1: LOAD r1, 0(r2)    # r1 available after MEM stage
I2: ADD  r3, r1, r4   # Needs r1 immediately β†’ STALL!
Cycle: 1    2    3    4    5    6
I1:    IF   ID   EX   MEM  WB
I2:         IF   ID   stall  EX  MEM  WB
                      ↑
                  Insert bubble (NOP)

Cost: One lost cycle per stall ⏱️

3.3 3. Control Hazards πŸ”€

Cause: Branch instruction changes program flow - we don't know which instruction to fetch next!

       BEQ r1, r2, target  # if r1 == r2, goto target
       ADD r3, r4, r5      # Next sequential instruction
       SUB r6, r7, r8
target: MUL r9, r10, r11   # Branch target

Problem: Branch outcome known in EX stage (cycle 3), but we're already fetching from cycle 2!

Solutions for Control Hazards

A. Stall Until Branch Resolves

Cycle: 1    2    3    4    5
BEQ:   IF   ID   EX   MEM  WB
Next:       IF   stall stall IF  ← Wait for branch decision

Cost: 2 cycles per branch 😱

B. Branch Prediction 🎯

  • Predict branch will be taken/not taken
  • Speculatively fetch from predicted path
  • If wrong, flush pipeline and restart

Simple Static Prediction:

  • Always predict "not taken" (continue sequential)
  • Always predict "taken" for backwards branches (loops)

Dynamic Prediction (more advanced):

  • Branch history table (BHT)
  • Two-bit saturating counter
  • 90%+ accuracy for many workloads!

C. Delayed Branch Slots

  • MIPS/SPARC approach: instruction after branch always executes
  • Compiler fills slot with useful work or NOP
BEQ  r1, r2, target
ADD  r3, r4, r5      # Delay slot - always executes!

4. Why Pipelining Improves Throughput

4.1 Without Pipelining (Sequential Execution)

Time:  0    5   10   15   20   25   30   35   40
I1:    [IF|ID|EX|MEM|WB]
I2:                      [IF|ID|EX|MEM|WB]
I3:                                        [IF|ID|EX|MEM|WB]
 
Throughput: 1 instruction per 5 cycles = 0.2 IPC

4.2 With Pipelining (Overlapped Execution)

Time:  0    1    2    3    4    5    6    7    8
I1:    IF   ID   EX   MEM  WB
I2:         IF   ID   EX   MEM  WB
I3:              IF   ID   EX   MEM  WB
I4:                   IF   ID   EX   MEM  WB
I5:                        IF   ID   EX   MEM  WB
 
Throughput: 1 instruction per cycle = 1.0 IPC (5x speedup!)

4.3 Speedup Formula

Ideal case (no hazards):

Speedup = Number of pipeline stages
        = 5 (for 5-stage pipeline)

Real-world (with hazards):

CPI = 1 + (stall cycles per instruction)
 
If 20% instructions are loads with 1 stall each:
CPI = 1 + 0.2 Γ— 1 = 1.2
Speedup = 5 / 1.2 = 4.17x

5. Key Takeaways

βœ… ISA defines the hardware-software contract - instructions, registers, memory model

βœ… 5-stage pipeline overlaps instruction execution for higher throughput

  • IF β†’ ID β†’ EX β†’ MEM β†’ WB

βœ… Three types of hazards:

  • Structural: Resource conflicts
  • Data: Dependencies between instructions
  • Control: Branches change flow

βœ… Hazard solutions:

  • Forwarding: Send results early
  • Stalling: Insert pipeline bubbles
  • Prediction: Guess branch outcomes

βœ… Pipelining improves throughput (instructions per cycle) without reducing latency (cycles per instruction)

6. Practice Problems

  1. Calculate IPC: A 5-stage pipeline with 30% branch instructions (2 stall cycles each) and 25% loads (1 stall cycle each). What is the effective CPI?

  2. Identify Hazards:

ADD r1, r2, r3
SUB r4, r1, r5
AND r6, r4, r7
OR  r8, r6, r9

Where do you need forwarding?

  1. Forwarding Logic: When can you forward from EX/MEM vs MEM/WB stage? When must you stall?

7. Next Steps

Now that you understand pipeline fundamentals:

β†’ Cache Coherence: How do multiple cores keep their caches consistent?

β†’ Advanced Pipelines: Superscalar, out-of-order execution, branch prediction

β†’ GPU Architecture: Massive parallelism through thousands of simple pipelines

β†’ Performance Analysis: Measure and optimize IPC in real systems

Remember: Every modern processor - from your smartphone to datacenter servers - builds on these fundamental concepts! πŸš€