Computer Architecture Fundamentals: ISA to Pipeline

Welcome to the foundation of modern computing! Understanding how processors execute instructions is crucial for anyone working in computer architecture, systems programming, or hardware design. Let's build your mental model from the ground up! 🏗️

What is an Instruction Set Architecture (ISA)?

The ISA is the contract between software and hardware - it defines:

Core ISA Components

1. Instructions: Operations the processor can perform

ADD  r1, r2, r3    # r1 = r2 + r3
LOAD r1, 0(r2)     # r1 = Memory[r2]
BEQ  r1, r2, label # if r1 == r2, goto label

2. Registers: Fast storage directly accessible by instructions

General-purpose registers (x0-x31 in RISC-V)
Special registers (PC, stack pointer, status flags)

3. Memory Model: How instructions access data

Load/Store architecture (RISC)
Memory-to-memory operations (CISC)

4. Data Types: What kinds of data are supported

Integers (8, 16, 32, 64-bit)
Floating-point numbers
SIMD vector types

Why ISA Matters

Portability: Software compiled for an ISA runs on any compatible processor

Performance: ISA design affects:

Code density (instructions per program)
Decode complexity
Pipeline efficiency
Power consumption

Examples of Popular ISAs:

x86-64: Desktop/server (Intel, AMD)
ARM: Mobile, embedded, Apple Silicon
RISC-V: Open-source, research, emerging products
MIPS: Education, embedded systems

The Classic 5-Stage Pipeline

Pipelining is like an assembly line for instructions - while one instruction is being decoded, another is being fetched, and yet another is executing!

The Five Stages

┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌──────────┐
│  FETCH  │ → │ DECODE  │ → │ EXECUTE │ → │ MEMORY  │ → │ WRITEBACK│
│   (IF)  │   │  (ID)   │   │  (EX)   │   │  (MEM)  │   │   (WB)   │
└─────────┘   └─────────┘   └─────────┘   └─────────┘   └──────────┘

1. Instruction Fetch (IF)

Read next instruction from memory using Program Counter (PC)
Increment PC (PC = PC + 4 for 32-bit instructions)
Place instruction in IF/ID pipeline register

PC → I-Cache → Instruction → IF/ID Register

2. Instruction Decode (ID)

Decode the instruction opcode
Read source registers from register file
Sign-extend immediate values
Pass control signals to next stages

// Example: ADD r1, r2, r3
Opcode:    ADD
Source1:   Read r2
Source2:   Read r3  
Dest:      r1

3. Execute (EX)

Perform ALU operation
Calculate memory address (for loads/stores)
Evaluate branch condition
Forward results if needed

// For ADD: result = rs1_value + rs2_value
// For LOAD: address = rs1_value + immediate
// For BEQ: compare rs1_value with rs2_value

4. Memory Access (MEM)

Load data from memory (if LOAD instruction)
Store data to memory (if STORE instruction)
Pass through ALU result for other instructions

// LOAD:  data = Memory[address]
// STORE: Memory[address] = data
// Other: pass-through from EX stage

5. Write Back (WB)

Write result to destination register
Update register file
Instruction completes!

// Write result to rd (destination register)
if (instruction_writes_register) {
    RegisterFile[rd] = result;
}

Pipeline in Action: Example Execution

Let's trace 5 instructions through the pipeline:

I1: LOAD r1, 0(r2)     # r1 = Memory[r2]
I2: ADD  r3, r1, r4    # r3 = r1 + r4
I3: SUB  r5, r3, r6    # r5 = r3 - r6  
I4: AND  r7, r5, r8    # r7 = r5 & r8
I5: STORE r7, 0(r9)    # Memory[r9] = r7

Pipeline Diagram:

Cycle: 1    2    3    4    5    6    7    8    9
I1:    IF   ID   EX   MEM  WB
I2:         IF   ID   EX   MEM  WB
I3:              IF   ID   EX   MEM  WB
I4:                   IF   ID   EX   MEM  WB
I5:                        IF   ID   EX   MEM  WB

Throughput: One instruction completes per cycle (after initial fill)!

Pipeline Hazards: When Things Go Wrong

Hazards are situations that prevent the next instruction from executing in its designated clock cycle.

1. Structural Hazards ⚙️

Cause: Hardware resource conflict - two instructions need the same resource simultaneously.

Example: Single memory port

I1: LOAD r1, 0(r2)  # Needs memory in MEM stage
I4: LOAD r5, 0(r6)  # Needs memory in IF stage → CONFLICT!

Solution:

Separate instruction and data caches (Harvard architecture)
Duplicate resources (multiple ALUs)
Stall one instruction until resource is free

2. Data Hazards 📊

Cause: Instruction depends on result of previous instruction still in pipeline.

Read-After-Write (RAW) - True Dependency

The most common hazard!

I1: ADD r1, r2, r3    # r1 = r2 + r3  (writes r1)
I2: SUB r4, r1, r5    # r4 = r1 - r5  (reads r1) → NEEDS r1!

Problem: I2 reads r1 in ID stage (cycle 2), but I1 writes r1 in WB stage (cycle 5)!

Cycle: 1    2    3    4    5
I1:    IF   ID   EX   MEM  WB  ← r1 written here
I2:         IF   ID   ...       ← r1 needed here → TOO EARLY!
                ↑
              Hazard!

Solutions for RAW Hazards

A. Forwarding (Bypassing) 🔄

Send result directly from EX or MEM stage to the next instruction's EX stage.

Cycle: 1    2    3    4    5
I1:    IF   ID   EX   MEM  WB
                  ↓ Forward!
I2:         IF   ID   EX   MEM  WB
                      ↑ Use forwarded value

Forwarding Paths:

// EX/MEM → EX forwarding
if (EX_MEM.rd == ID_EX.rs1) {
    ALU_input1 = EX_MEM.ALUResult;
}
 
// MEM/WB → EX forwarding  
if (MEM_WB.rd == ID_EX.rs1) {
    ALU_input1 = MEM_WB.Result;
}

B. Stalling (Pipeline Bubble) 💭

When forwarding isn't enough (e.g., load-use hazard):

I1: LOAD r1, 0(r2)    # r1 available after MEM stage
I2: ADD  r3, r1, r4   # Needs r1 immediately → STALL!

Cycle: 1    2    3    4    5    6
I1:    IF   ID   EX   MEM  WB
I2:         IF   ID   stall  EX  MEM  WB
                      ↑
                  Insert bubble (NOP)

Cost: One lost cycle per stall ⏱️

3. Control Hazards 🔀

Cause: Branch instruction changes program flow - we don't know which instruction to fetch next!

       BEQ r1, r2, target  # if r1 == r2, goto target
       ADD r3, r4, r5      # Next sequential instruction
       SUB r6, r7, r8
target: MUL r9, r10, r11   # Branch target

Problem: Branch outcome known in EX stage (cycle 3), but we're already fetching from cycle 2!

Solutions for Control Hazards

A. Stall Until Branch Resolves

Cycle: 1    2    3    4    5
BEQ:   IF   ID   EX   MEM  WB
Next:       IF   stall stall IF  ← Wait for branch decision

Cost: 2 cycles per branch 😱

B. Branch Prediction 🎯

Predict branch will be taken/not taken
Speculatively fetch from predicted path
If wrong, flush pipeline and restart

Simple Static Prediction:

Always predict "not taken" (continue sequential)
Always predict "taken" for backwards branches (loops)

Dynamic Prediction (more advanced):

Branch history table (BHT)
Two-bit saturating counter
90%+ accuracy for many workloads!

C. Delayed Branch Slots

MIPS/SPARC approach: instruction after branch always executes
Compiler fills slot with useful work or NOP

BEQ  r1, r2, target
ADD  r3, r4, r5      # Delay slot - always executes!

Why Pipelining Improves Throughput

Without Pipelining (Sequential Execution)

Time:  0    5   10   15   20   25   30   35   40
I1:    [IF|ID|EX|MEM|WB]
I2:                      [IF|ID|EX|MEM|WB]
I3:                                        [IF|ID|EX|MEM|WB]
 
Throughput: 1 instruction per 5 cycles = 0.2 IPC

With Pipelining (Overlapped Execution)

Time:  0    1    2    3    4    5    6    7    8
I1:    IF   ID   EX   MEM  WB
I2:         IF   ID   EX   MEM  WB
I3:              IF   ID   EX   MEM  WB
I4:                   IF   ID   EX   MEM  WB
I5:                        IF   ID   EX   MEM  WB
 
Throughput: 1 instruction per cycle = 1.0 IPC (5x speedup!)

Speedup Formula

Ideal case (no hazards):

Speedup = Number of pipeline stages
        = 5 (for 5-stage pipeline)

Real-world (with hazards):

CPI = 1 + (stall cycles per instruction)
 
If 20% instructions are loads with 1 stall each:
CPI = 1 + 0.2 × 1 = 1.2
Speedup = 5 / 1.2 = 4.17x

Key Takeaways

✅ ISA defines the hardware-software contract - instructions, registers, memory model

✅ 5-stage pipeline overlaps instruction execution for higher throughput

IF → ID → EX → MEM → WB

✅ Three types of hazards:

Structural: Resource conflicts
Data: Dependencies between instructions
Control: Branches change flow

✅ Hazard solutions:

Forwarding: Send results early
Stalling: Insert pipeline bubbles
Prediction: Guess branch outcomes

✅ Pipelining improves throughput (instructions per cycle) without reducing latency (cycles per instruction)

Practice Problems

Calculate IPC: A 5-stage pipeline with 30% branch instructions (2 stall cycles each) and 25% loads (1 stall cycle each). What is the effective CPI?
Identify Hazards:

ADD r1, r2, r3
SUB r4, r1, r5
AND r6, r4, r7
OR  r8, r6, r9

Where do you need forwarding?

Forwarding Logic: When can you forward from EX/MEM vs MEM/WB stage? When must you stall?

Next Steps

Now that you understand pipeline fundamentals:

→ Cache Coherence: How do multiple cores keep their caches consistent?

→ Advanced Pipelines: Superscalar, out-of-order execution, branch prediction

→ GPU Architecture: Massive parallelism through thousands of simple pipelines

→ Performance Analysis: Measure and optimize IPC in real systems

Remember: Every modern processor - from your smartphone to datacenter servers - builds on these fundamental concepts! 🚀

Computer Architecture Fundamentals: ISA to Pipeline

Prerequisites

Learning Objectives

Table of Contents

Computer Architecture Fundamentals: ISA to Pipeline

What is an Instruction Set Architecture (ISA)?

Core ISA Components

Why ISA Matters

The Classic 5-Stage Pipeline

The Five Stages

1. Instruction Fetch (IF)

2. Instruction Decode (ID)

3. Execute (EX)

4. Memory Access (MEM)

5. Write Back (WB)

Pipeline in Action: Example Execution

Pipeline Hazards: When Things Go Wrong

1. Structural Hazards ⚙️

2. Data Hazards 📊

Read-After-Write (RAW) - True Dependency

Solutions for RAW Hazards

3. Control Hazards 🔀

Solutions for Control Hazards

Why Pipelining Improves Throughput

Without Pipelining (Sequential Execution)

With Pipelining (Overlapped Execution)

Speedup Formula

Key Takeaways

Practice Problems

Next Steps

In This Topic

Related Topics

Quick Actions