Building STL-Like Containers from First Principles
Deep dive into implementing STL containers: linked lists, hash tables, red-black trees, and their performance characteristics.
Prerequisites
Make sure you're familiar with these concepts before diving in:
Learning Objectives
By the end of this topic, you will be able to:
Table of Contents
Building STL-Like Containers from First Principles
1. Table of Contents
- Design Goals & Vocabulary
- Linked Lists (foundation for chains & iterators)
std::unordered_map
(hash table with buckets)std::map
(balanced BST: Red-Black Tree)- Other STL Containers You'll Encounter
- [
vector
,deque
,set
,unordered_set
,priority_queue
,stack
,queue
,string
]
- [
- Iterator Invalidation Cheat Sheet
- Allocators & Exception Safety (brief)
- Testing Your Implementations
- Appendix: Minimal Code Skeletons
- Mermaid Notes
2. Linked Lists
Linked lists are fundamental. You'll use them directly (std::list
, std::forward_list
) and indirectly (as chains inside a hash table).
2.1 Singly Linked List (forward_list
-style)
A minimal node:
template <class T>
struct SNode {
T value;
SNode* next = nullptr;
};
Push front is O(1), search is O(n).
Visual
graph LR
H((head)) --> A([A])
A --> B([B])
B --> C([C])
C --> X((nullptr))
Why singly?
- Lower memory per node (one pointer).
- Great for simple chains (e.g., hash buckets).
- But no O(1) erase before a node without tracking the previous.
Typical operations:
push_front(v)
: new node →head
, O(1).find(pred)
: walk nodes, O(n).erase_after(prev)
: bypass node afterprev
, O(1).
2.2 Doubly Linked List (list
-style) with a sentinel
C++ std::list
is typically a circular doubly linked list with a sentinel (a node whose next
/prev
point to itself when empty). This erases most special cases.
template <class T>
struct DNode {
T value;
DNode* prev = nullptr;
DNode* next = nullptr;
};
Visual (circular with sentinel)
graph LR
S((sentinel))
S --- A([A])
A --- B([B])
B --- C([C])
C --- S
S --- C
C --- B
B --- A
A --- S
push_back
: insert just before sentinel, O(1).erase(it)
: relinkprev
/next
, O(1).splice
: constant-time re-link possible.
Trade-offs: O(1) insert/erase with iterator, but higher per-node overhead vs dynamic arrays.
3. std::unordered_map
(hash table with buckets)
A hash table stores (key, value)
and uses a hash function to pick a bucket (an index in a bucket array). Each bucket holds zero or more entries.
3.1 Separate chaining
Use an array (std::vector<Node*>
) of pointers to singly linked nodes.
flowchart LR
subgraph Buckets
B0((0)):::b
B1((1)):::b
B2((2)):::b
B3((3)):::b
end
B0 --> N00["(k0,v0)"] --> N01["(k8,v8)"]
B2 --> N20["(k2,v2)"]
classDef b fill:#eee,stroke:#888;
Why chains (vs open addressing):
- Simple to implement.
- Stable references within a bucket (until rehash).
- Performance is good with a decent hash and a reasonable load factor.
Node type:
template <class Key, class T>
struct HNode {
std::pair<Key, T> kv;
HNode* next = nullptr;
};
Insert / Find / Erase (sketch):
insert
: computei = hash(k) % bucket_count
, push_front intobuckets[i]
.find
: walk the chain atbuckets[i]
, compare keys withKeyEq
.erase
: unlink from chain (trackprev
pointer), delete node.
3.2 Rehashing & load factor
- Load factor
α = size / bucket_count
. - As
α
grows, buckets get longer → lookups slow. - Policy: When
α > max_load_factor
(e.g., 0.75), allocate a bigger bucket array (often ~2×), then reinsert all existing nodes into new buckets.
Average complexity: insert/find/erase are O(1) average; O(n) worst-case (all keys collide).
3.3 Iterators & invalidation (unordered)
- Iterating: walk each bucket, then its chain.
- Rehashing invalidates all iterators and references.
- Inserting without rehash: usually keeps existing iterators valid (except end).
erase
invalidates iterators to erased elements; others remain valid unless rehash occurs.
3.4 Alternative: open addressing (brief)
Instead of chaining, store elements directly in the bucket array and resolve collisions by probing (linear, quadratic, or double hashing).
- Pros: Better cache locality (no pointers), often faster at low load factors.
- Cons: Requires tombstones for deletions; clustering degrades performance near high α; iterator stability is poorer; rehash may move many elements.
4. std::map
(balanced BST: Red-Black Tree)
std::map
is an ordered associative container: keys are kept sorted according to a comparator. The standard doesn't mandate a specific tree, but red-black trees (RBTs) are the common choice (libstdc++, libc++, MSVC).
4.1 RB invariants
- Every node is red or black.
- Root is black.
- Null leaves (conceptual) are black.
- Red node has no red child (no red-red parent-child).
- Every path from a node to its descendant null leaves has the same number of black nodes (black height).
These ensure height = O(log n), giving O(log n) for find/insert/erase.
Visual (colors as labels)
graph TD
R((10 [Black])) --> L((5 [Red]))
R --> X((15 [Red]))
L --> LL((2 [Black]))
L --> LR((7 [Black]))
X --> XL((12 [Black]))
X --> XR((20 [Black]))
4.2 Rotations & fix-ups
Insertion process:
- Do a normal BST insert (node starts red).
- If a red-red violation occurs, fix-up using rotations and recoloring until invariants hold.
Left rotation (before → after)
graph LR
A((x)) --- B((y))
B --- BR((y.R))
subgraph Before
A --> AL((x.L))
A --> B
B --> BL((y.L))
B --> BR
end
graph LR
A((x)) --- BL((y.L))
subgraph After
B((y)) --> A
B --> BR((y.R))
A --> AL((x.L))
A --> BL
end
(Analogous right rotation exists.)
Deletion is more intricate (handling "double black"), but follows similar recolor/rotate patterns to restore invariants.
4.3 Iterators & invalidation (map)
- Insert/erase do not invalidate iterators to other elements (only the erased one becomes invalid).
- Iteration order is in-order according to the comparator.
5. Other STL Containers You'll Encounter
5.1 vector
(dynamic array)
- Layout: contiguous memory, great cache locality.
- Growth: when capacity is full, allocate a larger block (often 1.5–2×), move elements, free old block.
- Complexity:
push_back
: amortized O(1); reallocation makes a single op O(n).operator[]
: O(1).- Insert/erase in the middle: O(n) (shifts).
- Invalidation: reallocation invalidates all iterators & references; push_back can trigger it.
5.2 deque
(segmented array)
- A sequence of fixed-size blocks + an index structure.
- Fast push/pop at both ends; stable pointers to other blocks.
- Insert/erase in middle: O(n).
- Better than
vector
for frequent front operations.
5.3 set
/ unordered_set
set
: likemap<Key, mapped>
but stores keys only in an ordered tree.unordered_set
: likeunordered_map
buckets but stores keys only.
5.4 priority_queue
(binary heap)
- Backed by a
vector
+ heap property (parent ≥ children for max-heap). push
: sift-up;pop
: replace root with last element, sift-down.- Complexity:
push
/pop
O(log n),top
O(1).
5.5 stack
/ queue
(adaptors)
- Thin wrappers adapting an underlying container (
deque
by default) to a restricted interface. stack
: LIFO (push
,pop
,top
).queue
: FIFO (push
,pop
,front
,back
).
5.6 string
(specialized dynamic array of char
)
- Small String Optimization (SSO): short strings kept inline in the object (no heap).
- Otherwise similar rules to
vector
for growth & invalidation.
6. Iterator Invalidation Cheat Sheet
Container | On insert | On erase | On rehash |
---|---|---|---|
vector | Reallocating insert invalidates all | Erase at/after position invalidates from there | N/A |
deque | Insert may invalidate all (impl-dependent) | Erase may invalidate all (impl-dependent) | N/A |
list | No (iterators stable) | Only erased element | N/A |
unordered_map | No unless rehash occurs | Only erased element | Yes |
map | No (stable) | Only erased element | N/A |
The table reflects standardized behavior for C++ containers as summarized above.
7. Allocators & Exception Safety (brief)
- Allocators abstract memory acquisition; your containers should be allocator-aware to be STL-friendly.
- Exception safety:
- Basic: no leaks (RAII), invariants preserved.
- Strong: operations either succeed or have no effect (commit/rollback patterns).
- No-throw: for operations like
swap
where possible.
8. Testing Your Implementations
- Deterministic unit tests: insert/find/erase, edge cases (empty, single element, duplicates).
- Property tests: e.g., for
map
, in-order traversal must be sorted; forunordered_map
, all elements findable, counts match size. - Adversarial inputs: worst-case hashes (force collisions), ascending inserts in trees (ensure balancing works).
- Iterator law tests: increment, range-for, and invalidation behavior.
9. Appendix: Minimal Code Skeletons
These are compact sketches to show structure and key ideas. For a production-quality container you'll add: allocator support, full iterator types, const-correctness, exception safety, heterogeneous lookup, erase variants, node handle APIs, etc.
9.1 1) forward_list
-style singly linked list
template <class T>
class forward_list {
struct node {
T value;
node* next = nullptr;
node(const T& v, node* n=nullptr) : value(v), next(n) {}
};
node* head_ = nullptr;
public:
~forward_list() { clear(); }
void push_front(const T& v) { head_ = new node(v, head_); }
// erase after 'prev' (assumes prev != nullptr and prev->next != nullptr)
void erase_after(node* prev) {
node* victim = prev->next;
prev->next = victim->next;
delete victim;
}
node* find(const T& key) const {
for (node* p = head_; p; p = p->next)
if (p->value == key) return p;
return nullptr;
}
void clear() {
node* p = head_;
while (p) { node* n = p->next; delete p; p = n; }
head_ = nullptr;
}
node* head() const { return head_; } // for demo
};
9.2 2) list
-style doubly linked list with sentinel
template <class T>
class list {
struct node {
T value;
node* prev = nullptr;
node* next = nullptr;
node(const T& v) : value(v) {}
};
node* s_; // sentinel
public:
list() {
s_ = new node(T{}); // unused payload
s_->next = s_;
s_->prev = s_;
}
~list() { clear(); delete s_; }
bool empty() const { return s_->next == s_; }
void push_back(const T& v) {
node* n = new node(v);
n->prev = s_->prev;
n->next = s_;
s_->prev->next = n;
s_->prev = n;
}
void erase(node* it) {
it->prev->next = it->next;
it->next->prev = it->prev;
delete it;
}
void clear() {
node* p = s_->next;
while (p != s_) { node* nx = p->next; delete p; p = nx; }
s_->next = s_->prev = s_;
}
node* begin_node() const { return s_->next; }
node* end_node() const { return s_; } // sentinel acts as end
};
9.3 3) unordered_map
-style hash table (separate chaining)
#include <vector>
#include <functional>
#include <utility>
#include <stdexcept>
template <class Key, class T,
class Hash = std::hash<Key>,
class KeyEq = std::equal_to<Key>>
class unordered_map_like {
struct node {
std::pair<Key, T> kv;
node* next = nullptr;
node(const Key& k, const T& v, node* n=nullptr) : kv(k,v), next(n) {}
};
std::vector<node*> buckets_;
size_t size_ = 0;
float max_load_ = 0.75f;
Hash hasher_;
KeyEq eq_;
size_t idx_for(const Key& k) const {
return hasher_(k) % buckets_.size();
}
void rehash_if_needed() {
if (load_factor() <= max_load_) return;
rehash(std::max<size_t>(2*buckets_.size(), 16));
}
public:
explicit unordered_map_like(size_t n = 16)
: buckets_(std::max<size_t>(n, 1), nullptr) {}
~unordered_map_like() { clear(); }
float load_factor() const {
return buckets_.empty() ? 0.0f : float(size_) / float(buckets_.size());
}
void rehash(size_t new_n) {
std::vector<node*> nb(new_n, nullptr);
for (auto& head : buckets_) {
node* p = head;
while (p) {
node* nxt = p->next;
size_t i = hasher_(p->kv.first) % new_n;
p->next = nb[i];
nb[i] = p;
p = nxt;
}
head = nullptr;
}
buckets_.swap(nb);
}
T& operator[](const Key& k) {
size_t i = idx_for(k);
for (node* p = buckets_[i]; p; p = p->next)
if (eq_(p->kv.first, k)) return p->kv.second;
buckets_[i] = new node(k, T{}, buckets_[i]); // insert default
++size_;
rehash_if_needed();
// If rehash happened, index may have changed—so search again:
i = idx_for(k);
for (node* p = buckets_[i]; p; p = p->next)
if (eq_(p->kv.first, k)) return p->kv.second;
throw std::logic_error("unreachable");
}
bool erase(const Key& k) {
size_t i = idx_for(k);
node* p = buckets_[i];
node* prev = nullptr;
while (p) {
if (eq_(p->kv.first, k)) {
if (prev) prev->next = p->next;
else buckets_[i] = p->next;
delete p; --size_;
return true;
}
prev = p; p = p->next;
}
return false;
}
T* find_ptr(const Key& k) {
size_t i = idx_for(k);
for (node* p = buckets_[i]; p; p = p->next)
if (eq_(p->kv.first, k)) return &p->kv.second;
return nullptr;
}
void clear() {
for (auto& head : buckets_) {
node* p = head;
while (p) { node* nxt = p->next; delete p; p = nxt; }
head = nullptr;
}
size_ = 0;
}
size_t size() const { return size_; }
};
Notes:
• The iterator type is omitted for brevity; implement it as(bucket_index, node*)
and++
skips within a bucket then advances to the next non-empty bucket.
• To keepoperator[]
simple across rehash, we re-search after a possible resize.
9.4 4) map
-style red-black tree (sketch)
enum color { RED, BLACK };
template <class Key, class T, class Cmp = std::less<Key>>
class rb_map_like {
struct node {
std::pair<Key,T> kv;
node *parent = nullptr, *left = nullptr, *right = nullptr;
color c = RED;
explicit node(const Key& k, const T& v) : kv(k,v) {}
};
node* root_ = nullptr;
size_t size_ = 0;
Cmp cmp_;
void rotate_left(node* x) {
node* y = x->right;
x->right = y->left;
if (y->left) y->left->parent = x;
y->parent = x->parent;
if (!x->parent) root_ = y;
else if (x == x->parent->left) x->parent->left = y;
else x->parent->right = y;
y->left = x; x->parent = y;
}
void rotate_right(node* y) {
node* x = y->left;
y->left = x->right;
if (x->right) x->right->parent = y;
x->parent = y->parent;
if (!y->parent) root_ = x;
else if (y == y->parent->left) y->parent->left = x;
else y->parent->right = x;
x->right = y; y->parent = x;
}
void insert_fix(node* z) {
while (z->parent && z->parent->c == RED) {
node* gp = z->parent->parent;
if (z->parent == gp->left) {
node* y = gp->right; // uncle
if (y && y->c == RED) { // recolor
z->parent->c = BLACK; y->c = BLACK; gp->c = RED; z = gp;
} else {
if (z == z->parent->right) { z = z->parent; rotate_left(z); }
z->parent->c = BLACK; gp->c = RED; rotate_right(gp);
}
} else {
node* y = gp->left;
if (y && y->c == RED) {
z->parent->c = BLACK; y->c = BLACK; gp->c = RED; z = gp;
} else {
if (z == z->parent->left) { z = z->parent; rotate_right(z); }
z->parent->c = BLACK; gp->c = RED; rotate_left(gp);
}
}
}
root_->c = BLACK;
}
public:
~rb_map_like() { clear(root_); }
void clear(node* n) {
if (!n) return;
clear(n->left); clear(n->right); delete n;
if (n == root_) root_ = nullptr, size_ = 0;
}
T& insert_or_assign(const Key& k, const T& v) {
if (!root_) {
root_ = new node(k,v); root_->c = BLACK; ++size_; return root_->kv.second;
}
node* p = root_; node* parent = nullptr;
bool left = false;
while (p) {
parent = p;
if (cmp_(k, p->kv.first)) { p = p->left; left = true; }
else if (cmp_(p->kv.first, k)) { p = p->right; left = false; }
else { p->kv.second = v; return p->kv.second; } // assign
}
node* z = new node(k,v);
z->parent = parent;
if (left) parent->left = z; else parent->right = z;
++size_;
insert_fix(z);
return z->kv.second;
}
T* find_ptr(const Key& k) const {
node* p = root_;
while (p) {
if (cmp_(k, p->kv.first)) p = p->left;
else if (cmp_(p->kv.first, k)) p = p->right;
else return &p->kv.second;
}
return nullptr;
}
size_t size() const { return size_; }
};
Notes:
• Deletion fix-up is non-trivial; start with CLRS or a reliable reference and test thoroughly.
• Iterators perform in-order traversal; keep parent pointers to implement++
/--
.
10. Key Takeaways
- Linked lists are simple and powerful for constant-time splicing and bucket chains.
- Hash tables (
unordered_map
) give average O(1) ops; correctness hinges on rehash policy and hash/equality. - Balanced BSTs (
map
) provide sorted order with O(log n) ops; correctness hinges on RB invariants and rotations. - Pick the container based on access pattern, order requirements, memory locality, and iterator stability needs.
11. Mermaid Notes
If your Markdown renderer does not natively support Mermaid, you can:
- View this in GitHub (Mermaid supported).
- Or add a Mermaid plugin/extension for your Markdown app.
- Or replace Mermaid blocks with embedded PNG/SVG generated by the Mermaid CLI.
Prepared with ❤️ for a Data Structures 101 deep-dive.