Data Structures and Algorithms Guide
This guide provides a structured overview of common data structures and algorithms, their performance characteristics, and primary use cases.
Table of Contents
- Algorithmic Analysis (Big O)
- Linear Data Structures
- Non-Linear Data Structures
- Sorting Algorithms
- Searching Algorithms
- Graph Algorithms
- Algorithmic Techniques
Algorithmic Analysis (Big O Notation)
Big O notation is used to describe the performance or complexity of an algorithm. It characterizes the worst-case scenario, focusing on how the execution time or space requirements grow as the input size (n) increases.
| Notation | Name | Description & Example |
|---|---|---|
O(1) | Constant | Always takes the same amount of time, regardless of input size. (e.g., Array lookup by index) |
O(log n) | Logarithmic | Time increases logarithmically as input size grows. Efficient. (e.g., Binary Search) |
O(n) | Linear | Time is directly proportional to the input size. (e.g., Iterating through a list) |
O(n log n) | Log-Linear | A very common complexity for efficient sorting algorithms. (e.g., Merge Sort, Heap Sort) |
O(n^2) | Quadratic | Time is proportional to the square of the input size. (e.g., Nested loops, Bubble Sort) |
O(2^n) | Exponential | Time doubles with each addition to the input. Very slow. (e.g., Recursive Fibonacci) |
O(n!) | Factorial | Extremely slow; grows factorially. (e.g., Traveling Salesman Problem via brute force) |
Linear Data Structures
Structures that arrange data in a sequential manner.
Array
- Description: A collection of items stored in a contiguous block of memory, accessible by an index.
- Characteristics: Fixed size, elements are stored sequentially.
- Performance:
- Access:
O(1) - Search:
O(n) - Insertion:
O(n)(requires shifting elements) - Deletion:
O(n)(requires shifting elements)
- Access:
- Pros: Fast, predictable random access.
- Cons: Inefficient insertion/deletion, fixed size.
Dynamic Array (e.g., ArrayList in Java, list in Python)
- Description: An array that automatically resizes to accommodate new elements.
- Characteristics: When the array becomes full, it creates a new, larger array (often double the size) and copies all elements over.
- Performance:
- Access:
O(1) - Search:
O(n) - Insertion (at end): Amortized
O(1). Worst-caseO(n)if a resize is triggered. - Deletion (at end):
O(1)
- Access:
Note: The
O(n)copy operation during a resize is expensive, but because it happens infrequently, the average (amortized) cost of an append is stillO(1).
Linked List
- Description: A linear collection of nodes, where each node contains data and a pointer to the next node.
- Characteristics: Sequential access only; the head is the entry point.
- Performance:
- Access:
O(n) - Search:
O(n) - Insertion (at ends):
O(1) - Deletion (at ends):
O(1)
- Access:
- Pros: Efficient insertions/deletions at ends, flexible size.
- Cons: Slow lookups, requires more memory than an array due to pointers.
Stack
- Description: A Last-In, First-Out (LIFO) data structure.
- Core Operations: All
O(1)push: Add an element to the top.pop: Remove and return the top element.peek: View the top element without removing it.
- Use Cases: Function call stack, undo mechanisms, Depth-First Search (DFS).
Queue
- Description: A First-In, First-Out (FIFO) data structure.
- Core Operations: All
O(1)enqueue: Add an element to the back.dequeue: Remove and return the front element.peek: View the front element without removing it.
- Use Cases: Task scheduling, message processing, Breadth-First Search (BFS).
Non-Linear Data Structures
Structures where data is not arranged sequentially.
Hash Table (or Hash Map)
- Description: A structure that maps keys to values for highly efficient lookups.
- Implementation: Uses an array and a hashing function to compute an index for each key.
- Collision Handling: When two keys hash to the same index, a strategy like chaining (using a linked list at the index) is used.
- Performance:
- Access/Search/Insert/Delete (Average):
O(1) - Access/Search/Insert/Delete (Worst):
O(n)(if all keys cause collisions)
- Access/Search/Insert/Delete (Average):
- Pros: Extremely fast lookups.
- Cons: Unordered, not cache-friendly.
Tree
- Description: A hierarchical data structure with a root node and child nodes connected by edges.
- Terminology:
- Leaf: A node with no children.
- Depth: Length of the path from the root to a node.
- Height: Length of the longest path from the root to a leaf.
Common Tree Types
- Binary Tree: Each node has at most two children (left and right).
- Full: Each node has exactly zero or two children.
- Complete: All levels are filled except possibly the last, which is filled from left to right. (Heaps are complete trees).
- Balanced: The heights of the left and right subtrees of any node differ by at most one.
- Binary Search Tree (BST): A binary tree with a specific ordering: for any node, all values in its left subtree are smaller, and all values in its right subtree are larger.
- Performance:
O(log n)for all operations if balanced, butO(n)if unbalanced (degenerates into a linked list).
- Performance:
- Self-Balancing BSTs (AVL Tree, Red-Black Tree): These trees automatically perform rotations during insertions and deletions to maintain balance, guaranteeing
O(log n)performance. - Trie (Prefix Tree): A specialized tree for storing strings, enabling fast prefix-based searches.
Heap
- Description: A specialized complete binary tree where parent nodes have a specific relationship to their children.
- Types:
- Min-Heap: The value of each parent node is less than or equal to the value of its children. The root is the minimum element.
- Max-Heap: The value of each parent node is greater than or equal to the value of its children. The root is the maximum element.
- Performance:
- Insertion:
O(log n) - Deletion (of root):
O(log n) - Peek (at root):
O(1)
- Insertion:
- Use Case: Implementing Priority Queues.
Graph
- Description: A collection of nodes (vertices) connected by edges.
- Representations:
- Adjacency List: A map from each vertex to a list of its neighbors. Most common.
- Adjacency Matrix: A V x V matrix where
M[i][j] = 1if an edge exists.
- Types: Directed/Undirected, Weighted/Unweighted, Cyclic/Acyclic.
Sorting Algorithms
| Algorithm | Time (Best) | Time (Avg) | Time (Worst) | Space | Stable | Notes |
|---|---|---|---|---|---|---|
| Bubble Sort | O(n) | O(n^2) | O(n^2) | O(1) | Yes | Simple but very inefficient. |
| Selection Sort | O(n^2) | O(n^2) | O(n^2) | O(1) | No | Simple, but inefficient. |
| Insertion Sort | O(n) | O(n^2) | O(n^2) | O(1) | Yes | Efficient for small or nearly sorted lists. |
| Merge Sort | O(n log n) | O(n log n) | O(n log n) | O(n) | Yes | Reliable, efficient, but requires extra space. |
| Heap Sort | O(n log n) | O(n log n) | O(n log n) | O(1) | No | In-place version of selection sort using a heap. |
| Quick Sort | O(n log n) | O(n log n) | O(n^2) | O(log n) | No | Very fast on average, but has a bad worst case. |
Searching Algorithms
Linear Search
- Description: Checks each element in a list sequentially until the target is found.
- Performance:
O(n) - Requirement: None.
Binary Search
- Description: Finds an element by repeatedly dividing the search interval in half.
- Performance:
O(log n) - Requirement: The array must be sorted.
Graph Algorithms
Breadth-First Search (BFS)
- Description: Explores all neighbors of a node at the present depth before moving on to nodes at the next depth level.
- Data Structure: Uses a Queue.
- Use Case: Finding the shortest path in an unweighted graph.
Depth-First Search (DFS)
- Description: Explores as far as possible down one branch before backtracking.
- Data Structure: Uses a Stack (often the implicit call stack via recursion).
- Use Case: Detecting cycles, topological sorting, finding connected components.
Dijkstra’s Algorithm
- Description: Finds the shortest path from a single source node to all other nodes in a weighted graph with non-negative edge weights.
Minimum Spanning Tree (MST)
- Description: A subset of the edges of a connected, weighted graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight.
- Common Algorithms: Prim’s, Kruskal’s.
Topological Sort
- Description: A linear ordering of vertices in a Directed Acyclic Graph (DAG) such that for every directed edge from vertex
uto vertexv,ucomes beforevin the ordering.
Algorithmic Techniques
Dynamic Programming (DP)
- Description: A method for solving complex problems by breaking them down into simpler, overlapping subproblems. It solves each subproblem only once and stores the results to avoid redundant computation.
- Key Idea: Overlapping subproblems and optimal substructure.
Greedy Algorithms
- Description: An approach that makes the locally optimal choice at each stage with the hope of finding a global optimum.
Memoization
- Description: An optimization technique where the results of expensive function calls are cached and returned when the same inputs occur again. It is the top-down approach to implementing Dynamic Programming.
Bitwise Operations
- Description: Operations that manipulate individual bits of a number. They are extremely fast and can be used for clever and efficient solutions to certain problems (e.g., checking for odd/even, swapping numbers, managing sets with bitmasks).