Skip to main content
Softwareadvancedcppstldata-structurescontainershash-tablestreesperformance

Building STL-Like Containers from First Principles

Deep dive into implementing STL containers: linked lists, hash tables, red-black trees, and their performance characteristics.

60 min read
Updated 9/11/2024
2 prerequisites

Prerequisites

Make sure you're familiar with these concepts before diving in:

C++ Fundamentals and Performance
Algorithm Patterns for Technical Interviews

Learning Objectives

By the end of this topic, you will be able to:

Understand internal structure of linked lists (singly and doubly with sentinel)
Implement hash tables with separate chaining and rehashing policies
Master red-black tree invariants, rotations, and fix-up algorithms
Apply iterator invalidation rules and exception safety patterns
Choose appropriate containers based on access patterns and performance needs

Table of Contents

Building STL-Like Containers from First Principles


1. Table of Contents

  1. Design Goals & Vocabulary
  2. Linked Lists (foundation for chains & iterators)
  3. std::unordered_map (hash table with buckets)
  4. std::map (balanced BST: Red-Black Tree)
  5. Other STL Containers You'll Encounter
    • [vector, deque, set, unordered_set, priority_queue, stack, queue, string]
  6. Iterator Invalidation Cheat Sheet
  7. Allocators & Exception Safety (brief)
  8. Testing Your Implementations
  9. Appendix: Minimal Code Skeletons
  10. Mermaid Notes

2. Linked Lists

Linked lists are fundamental. You'll use them directly (std::list, std::forward_list) and indirectly (as chains inside a hash table).

2.1 Singly Linked List (forward_list-style)

A minimal node:

template <class T>
struct SNode {
    T value;
    SNode* next = nullptr;
};

Push front is O(1), search is O(n).

Visual

graph LR
  H((head)) --> A([A])
  A --> B([B])
  B --> C([C])
  C --> X((nullptr))

Why singly?

  • Lower memory per node (one pointer).
  • Great for simple chains (e.g., hash buckets).
  • But no O(1) erase before a node without tracking the previous.

Typical operations:

  • push_front(v): new node → head, O(1).
  • find(pred): walk nodes, O(n).
  • erase_after(prev): bypass node after prev, O(1).

2.2 Doubly Linked List (list-style) with a sentinel

C++ std::list is typically a circular doubly linked list with a sentinel (a node whose next/prev point to itself when empty). This erases most special cases.

template <class T>
struct DNode {
    T value;
    DNode* prev = nullptr;
    DNode* next = nullptr;
};

Visual (circular with sentinel)

graph LR
  S((sentinel))
  S --- A([A])
  A --- B([B])
  B --- C([C])
  C --- S
  S --- C
  C --- B
  B --- A
  A --- S
  • push_back: insert just before sentinel, O(1).
  • erase(it): relink prev/next, O(1).
  • splice: constant-time re-link possible.

Trade-offs: O(1) insert/erase with iterator, but higher per-node overhead vs dynamic arrays.


3. std::unordered_map (hash table with buckets)

A hash table stores (key, value) and uses a hash function to pick a bucket (an index in a bucket array). Each bucket holds zero or more entries.

3.1 Separate chaining

Use an array (std::vector<Node*>) of pointers to singly linked nodes.

flowchart LR
  subgraph Buckets
  B0((0)):::b
  B1((1)):::b
  B2((2)):::b
  B3((3)):::b
  end
  B0 --> N00["(k0,v0)"] --> N01["(k8,v8)"]
  B2 --> N20["(k2,v2)"]
  classDef b fill:#eee,stroke:#888;

Why chains (vs open addressing):

  • Simple to implement.
  • Stable references within a bucket (until rehash).
  • Performance is good with a decent hash and a reasonable load factor.

Node type:

template <class Key, class T>
struct HNode {
    std::pair<Key, T> kv;
    HNode* next = nullptr;
};

Insert / Find / Erase (sketch):

  • insert: compute i = hash(k) % bucket_count, push_front into buckets[i].
  • find: walk the chain at buckets[i], compare keys with KeyEq.
  • erase: unlink from chain (track prev pointer), delete node.

3.2 Rehashing & load factor

  • Load factor α = size / bucket_count.
  • As α grows, buckets get longer → lookups slow.
  • Policy: When α > max_load_factor (e.g., 0.75), allocate a bigger bucket array (often ~2×), then reinsert all existing nodes into new buckets.

Average complexity: insert/find/erase are O(1) average; O(n) worst-case (all keys collide).

3.3 Iterators & invalidation (unordered)

  • Iterating: walk each bucket, then its chain.
  • Rehashing invalidates all iterators and references.
  • Inserting without rehash: usually keeps existing iterators valid (except end).
  • erase invalidates iterators to erased elements; others remain valid unless rehash occurs.

3.4 Alternative: open addressing (brief)

Instead of chaining, store elements directly in the bucket array and resolve collisions by probing (linear, quadratic, or double hashing).

  • Pros: Better cache locality (no pointers), often faster at low load factors.
  • Cons: Requires tombstones for deletions; clustering degrades performance near high α; iterator stability is poorer; rehash may move many elements.

4. std::map (balanced BST: Red-Black Tree)

std::map is an ordered associative container: keys are kept sorted according to a comparator. The standard doesn't mandate a specific tree, but red-black trees (RBTs) are the common choice (libstdc++, libc++, MSVC).

4.1 RB invariants

  1. Every node is red or black.
  2. Root is black.
  3. Null leaves (conceptual) are black.
  4. Red node has no red child (no red-red parent-child).
  5. Every path from a node to its descendant null leaves has the same number of black nodes (black height).

These ensure height = O(log n), giving O(log n) for find/insert/erase.

Visual (colors as labels)

graph TD
  R((10 [Black])) --> L((5 [Red]))
  R --> X((15 [Red]))
  L --> LL((2 [Black]))
  L --> LR((7 [Black]))
  X --> XL((12 [Black]))
  X --> XR((20 [Black]))

4.2 Rotations & fix-ups

Insertion process:

  1. Do a normal BST insert (node starts red).
  2. If a red-red violation occurs, fix-up using rotations and recoloring until invariants hold.

Left rotation (before → after)

graph LR
  A((x)) --- B((y))
  B --- BR((y.R))
  subgraph Before
  A --> AL((x.L))
  A --> B
  B --> BL((y.L))
  B --> BR
  end
graph LR
  A((x)) --- BL((y.L))
  subgraph After
  B((y)) --> A
  B --> BR((y.R))
  A --> AL((x.L))
  A --> BL
  end

(Analogous right rotation exists.)

Deletion is more intricate (handling "double black"), but follows similar recolor/rotate patterns to restore invariants.

4.3 Iterators & invalidation (map)

  • Insert/erase do not invalidate iterators to other elements (only the erased one becomes invalid).
  • Iteration order is in-order according to the comparator.

5. Other STL Containers You'll Encounter

5.1 vector (dynamic array)

  • Layout: contiguous memory, great cache locality.
  • Growth: when capacity is full, allocate a larger block (often 1.5–2×), move elements, free old block.
  • Complexity:
    • push_back: amortized O(1); reallocation makes a single op O(n).
    • operator[]: O(1).
    • Insert/erase in the middle: O(n) (shifts).
  • Invalidation: reallocation invalidates all iterators & references; push_back can trigger it.

5.2 deque (segmented array)

  • A sequence of fixed-size blocks + an index structure.
  • Fast push/pop at both ends; stable pointers to other blocks.
  • Insert/erase in middle: O(n).
  • Better than vector for frequent front operations.

5.3 set / unordered_set

  • set: like map<Key, mapped> but stores keys only in an ordered tree.
  • unordered_set: like unordered_map buckets but stores keys only.

5.4 priority_queue (binary heap)

  • Backed by a vector + heap property (parent ≥ children for max-heap).
  • push: sift-up; pop: replace root with last element, sift-down.
  • Complexity: push/pop O(log n), top O(1).

5.5 stack / queue (adaptors)

  • Thin wrappers adapting an underlying container (deque by default) to a restricted interface.
  • stack: LIFO (push, pop, top).
  • queue: FIFO (push, pop, front, back).

5.6 string (specialized dynamic array of char)

  • Small String Optimization (SSO): short strings kept inline in the object (no heap).
  • Otherwise similar rules to vector for growth & invalidation.

6. Iterator Invalidation Cheat Sheet

ContainerOn insertOn eraseOn rehash
vectorReallocating insert invalidates allErase at/after position invalidates from thereN/A
dequeInsert may invalidate all (impl-dependent)Erase may invalidate all (impl-dependent)N/A
listNo (iterators stable)Only erased elementN/A
unordered_mapNo unless rehash occursOnly erased elementYes
mapNo (stable)Only erased elementN/A

The table reflects standardized behavior for C++ containers as summarized above.


7. Allocators & Exception Safety (brief)

  • Allocators abstract memory acquisition; your containers should be allocator-aware to be STL-friendly.
  • Exception safety:
    • Basic: no leaks (RAII), invariants preserved.
    • Strong: operations either succeed or have no effect (commit/rollback patterns).
    • No-throw: for operations like swap where possible.

8. Testing Your Implementations

  • Deterministic unit tests: insert/find/erase, edge cases (empty, single element, duplicates).
  • Property tests: e.g., for map, in-order traversal must be sorted; for unordered_map, all elements findable, counts match size.
  • Adversarial inputs: worst-case hashes (force collisions), ascending inserts in trees (ensure balancing works).
  • Iterator law tests: increment, range-for, and invalidation behavior.

9. Appendix: Minimal Code Skeletons

These are compact sketches to show structure and key ideas. For a production-quality container you'll add: allocator support, full iterator types, const-correctness, exception safety, heterogeneous lookup, erase variants, node handle APIs, etc.

9.1 1) forward_list-style singly linked list

template <class T>
class forward_list {
    struct node {
        T value;
        node* next = nullptr;
        node(const T& v, node* n=nullptr) : value(v), next(n) {}
    };
    node* head_ = nullptr;
 
public:
    ~forward_list() { clear(); }
 
    void push_front(const T& v) { head_ = new node(v, head_); }
 
    // erase after 'prev' (assumes prev != nullptr and prev->next != nullptr)
    void erase_after(node* prev) {
        node* victim = prev->next;
        prev->next = victim->next;
        delete victim;
    }
 
    node* find(const T& key) const {
        for (node* p = head_; p; p = p->next)
            if (p->value == key) return p;
        return nullptr;
    }
 
    void clear() {
        node* p = head_;
        while (p) { node* n = p->next; delete p; p = n; }
        head_ = nullptr;
    }
 
    node* head() const { return head_; } // for demo
};

9.2 2) list-style doubly linked list with sentinel

template <class T>
class list {
    struct node {
        T value;
        node* prev = nullptr;
        node* next = nullptr;
        node(const T& v) : value(v) {}
    };
 
    node* s_; // sentinel
 
public:
    list() {
        s_ = new node(T{}); // unused payload
        s_->next = s_;
        s_->prev = s_;
    }
    ~list() { clear(); delete s_; }
 
    bool empty() const { return s_->next == s_; }
 
    void push_back(const T& v) {
        node* n = new node(v);
        n->prev = s_->prev;
        n->next = s_;
        s_->prev->next = n;
        s_->prev = n;
    }
 
    void erase(node* it) {
        it->prev->next = it->next;
        it->next->prev = it->prev;
        delete it;
    }
 
    void clear() {
        node* p = s_->next;
        while (p != s_) { node* nx = p->next; delete p; p = nx; }
        s_->next = s_->prev = s_;
    }
 
    node* begin_node() const { return s_->next; }
    node* end_node()   const { return s_; } // sentinel acts as end
};

9.3 3) unordered_map-style hash table (separate chaining)

#include <vector>
#include <functional>
#include <utility>
#include <stdexcept>
 
template <class Key, class T,
          class Hash  = std::hash<Key>,
          class KeyEq = std::equal_to<Key>>
class unordered_map_like {
    struct node {
        std::pair<Key, T> kv;
        node* next = nullptr;
        node(const Key& k, const T& v, node* n=nullptr) : kv(k,v), next(n) {}
    };
 
    std::vector<node*> buckets_;
    size_t size_ = 0;
    float  max_load_ = 0.75f;
    Hash   hasher_;
    KeyEq  eq_;
 
    size_t idx_for(const Key& k) const {
        return hasher_(k) % buckets_.size();
    }
 
    void rehash_if_needed() {
        if (load_factor() <= max_load_) return;
        rehash(std::max<size_t>(2*buckets_.size(), 16));
    }
 
public:
    explicit unordered_map_like(size_t n = 16)
    : buckets_(std::max<size_t>(n, 1), nullptr) {}
 
    ~unordered_map_like() { clear(); }
 
    float load_factor() const {
        return buckets_.empty() ? 0.0f : float(size_) / float(buckets_.size());
    }
 
    void rehash(size_t new_n) {
        std::vector<node*> nb(new_n, nullptr);
        for (auto& head : buckets_) {
            node* p = head;
            while (p) {
                node* nxt = p->next;
                size_t i = hasher_(p->kv.first) % new_n;
                p->next = nb[i];
                nb[i] = p;
                p = nxt;
            }
            head = nullptr;
        }
        buckets_.swap(nb);
    }
 
    T& operator[](const Key& k) {
        size_t i = idx_for(k);
        for (node* p = buckets_[i]; p; p = p->next)
            if (eq_(p->kv.first, k)) return p->kv.second;
        buckets_[i] = new node(k, T{}, buckets_[i]); // insert default
        ++size_;
        rehash_if_needed();
        // If rehash happened, index may have changed—so search again:
        i = idx_for(k);
        for (node* p = buckets_[i]; p; p = p->next)
            if (eq_(p->kv.first, k)) return p->kv.second;
        throw std::logic_error("unreachable");
    }
 
    bool erase(const Key& k) {
        size_t i = idx_for(k);
        node* p = buckets_[i];
        node* prev = nullptr;
        while (p) {
            if (eq_(p->kv.first, k)) {
                if (prev) prev->next = p->next;
                else      buckets_[i] = p->next;
                delete p; --size_;
                return true;
            }
            prev = p; p = p->next;
        }
        return false;
    }
 
    T* find_ptr(const Key& k) {
        size_t i = idx_for(k);
        for (node* p = buckets_[i]; p; p = p->next)
            if (eq_(p->kv.first, k)) return &p->kv.second;
        return nullptr;
    }
 
    void clear() {
        for (auto& head : buckets_) {
            node* p = head;
            while (p) { node* nxt = p->next; delete p; p = nxt; }
            head = nullptr;
        }
        size_ = 0;
    }
 
    size_t size() const { return size_; }
};

Notes:
• The iterator type is omitted for brevity; implement it as (bucket_index, node*) and ++ skips within a bucket then advances to the next non-empty bucket.
• To keep operator[] simple across rehash, we re-search after a possible resize.

9.4 4) map-style red-black tree (sketch)

enum color { RED, BLACK };
 
template <class Key, class T, class Cmp = std::less<Key>>
class rb_map_like {
    struct node {
        std::pair<Key,T> kv;
        node *parent = nullptr, *left = nullptr, *right = nullptr;
        color c = RED;
        explicit node(const Key& k, const T& v) : kv(k,v) {}
    };
 
    node* root_ = nullptr;
    size_t size_ = 0;
    Cmp cmp_;
 
    void rotate_left(node* x) {
        node* y = x->right;
        x->right = y->left;
        if (y->left) y->left->parent = x;
        y->parent = x->parent;
        if (!x->parent) root_ = y;
        else if (x == x->parent->left) x->parent->left = y;
        else                           x->parent->right = y;
        y->left = x; x->parent = y;
    }
 
    void rotate_right(node* y) {
        node* x = y->left;
        y->left = x->right;
        if (x->right) x->right->parent = y;
        x->parent = y->parent;
        if (!y->parent) root_ = x;
        else if (y == y->parent->left) y->parent->left = x;
        else                           y->parent->right = x;
        x->right = y; y->parent = x;
    }
 
    void insert_fix(node* z) {
        while (z->parent && z->parent->c == RED) {
            node* gp = z->parent->parent;
            if (z->parent == gp->left) {
                node* y = gp->right; // uncle
                if (y && y->c == RED) { // recolor
                    z->parent->c = BLACK; y->c = BLACK; gp->c = RED; z = gp;
                } else {
                    if (z == z->parent->right) { z = z->parent; rotate_left(z); }
                    z->parent->c = BLACK; gp->c = RED; rotate_right(gp);
                }
            } else {
                node* y = gp->left;
                if (y && y->c == RED) {
                    z->parent->c = BLACK; y->c = BLACK; gp->c = RED; z = gp;
                } else {
                    if (z == z->parent->left) { z = z->parent; rotate_right(z); }
                    z->parent->c = BLACK; gp->c = RED; rotate_left(gp);
                }
            }
        }
        root_->c = BLACK;
    }
 
public:
    ~rb_map_like() { clear(root_); }
 
    void clear(node* n) {
        if (!n) return;
        clear(n->left); clear(n->right); delete n;
        if (n == root_) root_ = nullptr, size_ = 0;
    }
 
    T& insert_or_assign(const Key& k, const T& v) {
        if (!root_) {
            root_ = new node(k,v); root_->c = BLACK; ++size_; return root_->kv.second;
        }
        node* p = root_; node* parent = nullptr;
        bool left = false;
        while (p) {
            parent = p;
            if (cmp_(k, p->kv.first)) { p = p->left; left = true; }
            else if (cmp_(p->kv.first, k)) { p = p->right; left = false; }
            else { p->kv.second = v; return p->kv.second; } // assign
        }
        node* z = new node(k,v);
        z->parent = parent;
        if (left) parent->left = z; else parent->right = z;
        ++size_;
        insert_fix(z);
        return z->kv.second;
    }
 
    T* find_ptr(const Key& k) const {
        node* p = root_;
        while (p) {
            if (cmp_(k, p->kv.first)) p = p->left;
            else if (cmp_(p->kv.first, k)) p = p->right;
            else return &p->kv.second;
        }
        return nullptr;
    }
 
    size_t size() const { return size_; }
};

Notes:
• Deletion fix-up is non-trivial; start with CLRS or a reliable reference and test thoroughly.
• Iterators perform in-order traversal; keep parent pointers to implement ++/--.


10. Key Takeaways

  • Linked lists are simple and powerful for constant-time splicing and bucket chains.
  • Hash tables (unordered_map) give average O(1) ops; correctness hinges on rehash policy and hash/equality.
  • Balanced BSTs (map) provide sorted order with O(log n) ops; correctness hinges on RB invariants and rotations.
  • Pick the container based on access pattern, order requirements, memory locality, and iterator stability needs.

11. Mermaid Notes

If your Markdown renderer does not natively support Mermaid, you can:

  • View this in GitHub (Mermaid supported).
  • Or add a Mermaid plugin/extension for your Markdown app.
  • Or replace Mermaid blocks with embedded PNG/SVG generated by the Mermaid CLI.

Prepared with ❤️ for a Data Structures 101 deep-dive.