| name | data-structures |
| description | Use when selecting, implementing, or reasoning about data structures. Covers arrays, linked lists, stacks, queues, hash tables, trees (BST, AVL, Red-Black, B-Tree, Trie), heaps, and graphs (adjacency list, adjacency matrix). Based on Knuth's TAOCP Vol. 1.
USE FOR: choosing data structures by access pattern, understanding operation complexities, implementing fundamental data structures, comparing data structure tradeoffs
DO NOT USE FOR: graph algorithms on data structures (use graph-algorithms), sorting data (use sorting-searching)
|
| license | MIT |
| metadata | {"displayName":"Data Structures","author":"Tyler-R-Kendrick"} |
| compatibility | claude, copilot, cursor |
| references | [{"title":"The Art of Computer Programming, Vol. 1: Fundamental Algorithms — Donald Knuth","url":"https://www-cs-faculty.stanford.edu/~knuth/taocp.html"},{"title":"Data Structure — Wikipedia","url":"https://en.wikipedia.org/wiki/Data_structure"}] |
Data Structures
Overview
Data structures are the foundation of efficient software. The choice of data structure determines the complexity of every operation your program performs. Knuth's The Art of Computer Programming, Volume 1: Fundamental Algorithms provides the definitive treatment of information structures -- from linear lists through trees and multilinked structures. This skill covers the major data structures, their operation complexities, and guidance on when to use each.
Arrays
A contiguous block of memory storing elements of the same type, accessed by index.
| Operation | Time |
|---|
| Access by index | O(1) |
| Search (unsorted) | O(n) |
| Search (sorted) | O(log n) |
| Insert at end | O(1) amortized (dynamic array) |
| Insert at position | O(n) |
| Delete at position | O(n) |
Use when: Random access is frequent, data size is known or grows by appending, cache locality matters.
Linked Lists
Elements (nodes) are stored non-contiguously; each node contains data and a pointer to the next (and optionally previous) node.
Variants
- Singly linked: Each node points to the next. Traversal is forward only.
- Doubly linked: Each node points to both next and previous. Traversal in both directions.
- Circular: The last node points back to the first, forming a cycle.
| Operation | Singly | Doubly |
|---|
| Access by index | O(n) | O(n) |
| Search | O(n) | O(n) |
| Insert at head | O(1) | O(1) |
| Insert at tail (with tail pointer) | O(1) | O(1) |
| Insert at position (given pointer) | O(1) | O(1) |
| Delete at head | O(1) | O(1) |
| Delete at position (given pointer) | O(n) for singly, O(1) for doubly | O(1) |
Use when: Frequent insertions/deletions at arbitrary positions, no need for random access, implementing stacks/queues.
Stacks
Last-In, First-Out (LIFO) structure.
| Operation | Time |
|---|
| Push | O(1) |
| Pop | O(1) |
| Peek/Top | O(1) |
| Search | O(n) |
Implementations: Array-based (dynamic array) or linked-list-based.
Use when: Undo operations, expression evaluation/parsing, DFS traversal, call stack simulation, balanced parentheses checking.
Queues
Standard Queue (FIFO)
First-In, First-Out structure.
| Operation | Time |
|---|
| Enqueue | O(1) |
| Dequeue | O(1) |
| Peek/Front | O(1) |
| Search | O(n) |
Deque (Double-Ended Queue)
Insert and remove from both ends.
| Operation | Time |
|---|
| Insert front/back | O(1) |
| Remove front/back | O(1) |
| Peek front/back | O(1) |
Priority Queue
Elements are dequeued by priority, not insertion order. Typically implemented with a heap.
| Operation | Time (binary heap) |
|---|
| Insert | O(log n) |
| Extract-min/max | O(log n) |
| Peek min/max | O(1) |
| Decrease key | O(log n) |
Use when: BFS traversal (standard queue), scheduling (priority queue), sliding window problems (deque).
Hash Tables
Store key-value pairs with near-constant-time access by hashing keys to array indices.
Collision Handling
- Chaining: Each bucket holds a linked list (or other collection) of entries that hash to the same index.
- Simple to implement. Performance degrades to O(n/k) with poor hash distribution.
- Open Addressing: All entries stored in the array itself. On collision, probe for the next open slot.
- Linear probing: Check the next slot sequentially. Simple but suffers from clustering.
- Quadratic probing: Check slots at quadratic intervals. Reduces clustering.
- Double hashing: Use a second hash function to determine probe step. Best distribution.
| Operation | Average | Worst |
|---|
| Insert | O(1) | O(n) |
| Search | O(1) | O(n) |
| Delete | O(1) | O(n) |
Load factor: Ratio of stored elements to table size. Keep below 0.7-0.75 for good performance; resize (rehash) when exceeded.
Use when: Fast key-based lookup is critical, keys are hashable, order does not matter.
Trees
Binary Search Tree (BST)
Each node has at most two children; left child < parent < right child.
| Operation | Average | Worst (degenerate) |
|---|
| Search | O(log n) | O(n) |
| Insert | O(log n) | O(n) |
| Delete | O(log n) | O(n) |
AVL Tree
Self-balancing BST where the height difference between left and right subtrees of any node is at most 1.
| Operation | Time |
|---|
| Search | O(log n) |
| Insert | O(log n) |
| Delete | O(log n) |
Tradeoff: Strictly balanced, so faster lookups than Red-Black trees, but more rotations on insert/delete.
Red-Black Tree
Self-balancing BST with color properties guaranteeing that the longest path is at most twice the shortest.
| Operation | Time |
|---|
| Search | O(log n) |
| Insert | O(log n) |
| Delete | O(log n) |
Tradeoff: Less strictly balanced than AVL, so fewer rotations on insert/delete but slightly slower lookups. Used in most standard library map/set implementations (C++ std::map, Java TreeMap).
B-Tree
Generalized self-balancing tree where each node can have many children. Designed for systems that read/write large blocks of data.
| Operation | Time |
|---|
| Search | O(log n) |
| Insert | O(log n) |
| Delete | O(log n) |
Use when: Databases and file systems -- minimizes disk I/O by maximizing keys per node.
Trie (Prefix Tree)
Tree where each node represents a character; paths from root to leaves represent strings.
| Operation | Time |
|---|
| Search | O(m) where m = key length |
| Insert | O(m) |
| Delete | O(m) |
| Prefix search | O(m) |
Use when: Autocomplete, spell checking, IP routing tables, prefix-based searching.
Heaps
A complete binary tree satisfying the heap property.
Min-Heap
Parent <= children. Root is the minimum element.
Max-Heap
Parent >= children. Root is the maximum element.
| Operation | Time |
|---|
| Insert | O(log n) |
| Extract min/max | O(log n) |
| Peek min/max | O(1) |
| Build heap | O(n) |
| Decrease/increase key | O(log n) |
Implementations: Typically a binary heap backed by an array. For better amortized performance, consider Fibonacci heaps (O(1) amortized insert and decrease-key).
Use when: Priority queues, heap sort, finding k-th largest/smallest, median maintenance.
Graphs
A graph G = (V, E) consists of vertices V and edges E connecting pairs of vertices.
Representations
Adjacency List
Each vertex stores a list of its neighbors.
| Operation | Time |
|---|
| Add vertex | O(1) |
| Add edge | O(1) |
| Remove edge | O(degree) |
| Check edge exists | O(degree) |
| Space | O(V + E) |
Best for: Sparse graphs (E << V^2). Most graph algorithms prefer this representation.
Adjacency Matrix
A V x V matrix where entry (i, j) indicates whether an edge exists between vertices i and j.
| Operation | Time |
|---|
| Add vertex | O(V^2) -- resize |
| Add edge | O(1) |
| Remove edge | O(1) |
| Check edge exists | O(1) |
| Space | O(V^2) |
Best for: Dense graphs (E close to V^2), when fast edge existence checks are needed, small graphs.
Choosing the Right Data Structure
| Need | Data Structure | Why |
|---|
| Fast index-based access | Array | O(1) random access |
| Fast insertions/deletions anywhere | Linked List | O(1) with pointer |
| LIFO behavior | Stack | Push/pop O(1) |
| FIFO behavior | Queue | Enqueue/dequeue O(1) |
| Fast key-value lookup | Hash Table | O(1) average |
| Ordered key-value storage | BST / Red-Black Tree | O(log n) with ordering |
| Fast lookup with frequent reads | AVL Tree | Strict balance |
| Fast lookup with frequent writes | Red-Black Tree | Fewer rotations |
| Disk-based storage / databases | B-Tree | Minimizes I/O |
| String prefix operations | Trie | O(m) prefix search |
| Priority-based processing | Heap / Priority Queue | O(log n) extract-min/max |
| Modeling relationships | Graph (adjacency list) | Flexible structure |
Best Practices
- Choose the data structure based on the dominant operation pattern: read-heavy, write-heavy, or balanced.
- Prefer standard library implementations -- they are well-tested and optimized for real-world usage.
- Consider cache locality: arrays and array-backed structures (heaps, hash tables with open addressing) are more cache-friendly than pointer-based structures.
- For concurrent access, consider concurrent variants (ConcurrentHashMap, lock-free queues).
- Remember that theoretical complexity is not the full story -- constant factors, memory allocation patterns, and cache behavior matter in practice.
- Reference Knuth's TAOCP Vol. 1, Chapter 2 (Information Structures) for rigorous treatment of linked structures, trees, and multilinked representations.