System Design
A guide to the principles, patterns, and components of building scalable, reliable, and efficient software systems.
Table of Contents
- Core Concepts
- Fundamental Trade-offs
- Architectural Patterns
- Core Components
- Design Patterns
- Additional Topics
- Resources
Core Concepts
Scalability
Scalability is a system’s ability to handle a growing amount of work or traffic.
- Vertical Scaling (Scaling Up): Adding more resources (e.g., CPU, RAM) to a single server. Simpler to implement but has physical limits.
- Horizontal Scaling (Scaling Out): Adding more servers to distribute the load. Offers greater scalability but increases complexity. Requires stateless servers for effective implementation, with session management handled by a centralized store like Redis or Memcached.
Performance
Performance involves optimizing response times, data throughput, and processing speeds.
- CPU Bound: Processing time is limited by CPU power.
- I/O Bound: Processing time is limited by input/output operations (e.g., disk or network access).
Availability
Availability is the degree to which a system remains operational and accessible, often measured in “nines.”
| Availability % | Downtime per year |
|---|---|
| 90% (one nine) | 36.53 days |
| 99% (two nines) | 3.65 days |
| 99.9% (three nines) | 8.77 hours |
| 99.99% (four nines) | 52.60 minutes |
| 99.999% (five nines) | 5.26 minutes |
- In Sequence:
Availability (Total) = Availability (Foo) * Availability (Bar) - In Parallel:
Availability (Total) = 1 - (1 - Availability (Foo)) * (1 - Availability (Bar))
Reliability
Reliability ensures a system operates consistently and correctly, even under stress or in the presence of failures. It often involves:
- Redundancy: Duplicating critical components.
- Failover: Automatically switching to a standby system in case of failure.
- Active-Passive: A passive node takes over when the active node fails.
- Active-Active: Multiple active servers share the load.
Consistency
Consistency ensures that all users see the same data at the same time.
- Strong Consistency: Every read receives the most recent write or an error. Data is replicated synchronously.
- Use Cases: Financial systems, file systems.
- Eventual Consistency: Reads will eventually see the latest write. Data is replicated asynchronously.
- Use Cases: DNS, email, social media feeds.
- Weak Consistency: After a write, a read may or may not see it.
- Use Cases: VoIP, video chat, real-time games.
Maintainability
Maintainability ensures a system is easy to understand, modify, and scale over time. Key practices include:
- Clean, modular code
- Comprehensive documentation
- Automated testing and deployment
- Code reviews
Security
Security encompasses measures to protect a system from unauthorized access and attacks, including:
- Data encryption
- Access control and authentication
- Infrastructure integrity
Fundamental Trade-offs
CAP Theorem
In a distributed system, you can only choose two of the following three guarantees at any given time:
- Consistency: Every read receives the most recent write or an error.
- Availability: Every request receives a non-error response, without the guarantee that it contains the most recent write.
- Partition Tolerance: The system continues to operate despite network failures that split the system into multiple partitions.
Note: In modern distributed systems, partition tolerance is a necessity. Therefore, the trade-off is almost always between Consistency and Availability.
- CP (Consistency & Partition Tolerance): The system prioritizes consistency over availability. If a partition occurs, the system may return an error or timeout to avoid returning stale data.
- Choose if: The business requires atomic reads and writes.
- AP (Availability & Partition Tolerance): The system prioritizes availability over consistency. It returns the most readily available version of the data, which might not be the latest.
- Choose if: The business can tolerate eventual consistency and requires the system to remain operational despite external errors.
Latency vs. Throughput
- Latency: The time it takes to perform an action or produce a result. (e.g., how long a user waits for a page to load).
- Throughput: The number of actions or results per unit of time. (e.g., requests per second).
Goal: Strive for maximal throughput with acceptable latency.
Performance vs. Scalability
- Performance Problem: The system is slow for a single user.
- Scalability Problem: The system is fast for a single user but slow under heavy load.
A service is scalable if adding resources results in a proportional increase in performance.
Architectural Patterns
- Monolithic Architecture: A single, unified codebase where all components are interdependent. Easier to manage at a small scale but lacks flexibility and scalability.
- Microservices Architecture: An application is broken down into a collection of loosely coupled, independent services. This enables independent development, scaling, and deployment.
- Event-Driven Architecture: A model where services or functions respond to events asynchronously. Ideal for applications requiring real-time updates or complex workflows.
Core Components
Databases
Databases store, manage, and retrieve data.
- RDBMS (Relational): Use structured schemas and are optimized for consistency (e.g., SQL).
- NoSQL (Non-Relational): Optimized for flexibility, scalability, and unstructured data.
Caches
Caches provide fast, temporary storage for frequently accessed data, reducing load on primary data stores and speeding up response times.
Load Balancers
Load balancers distribute incoming requests across multiple servers to prevent any single server from becoming a bottleneck, improving reliability and scalability.
Message Queues
Message queues enable asynchronous communication, decoupling services and allowing them to communicate without direct interaction. They are useful for handling high volumes of data and managing background tasks.
Content Delivery Networks (CDNs)
CDNs are distributed networks of servers that cache content closer to end-users, reducing latency and improving load times for static assets.
Design Patterns
- Load Balancing: Distributes workload across servers to enhance availability and prevent bottlenecks.
- Caching: Stores frequently accessed data in memory to speed up retrieval and reduce database load.
- Sharding: Splits a database into smaller, more manageable pieces (shards) to improve scalability and performance.
- Circuit Breaker: Protects services from cascading failures by halting requests to a service that is unresponsive.
- Replication: Creating copies of data or components to improve availability and fault tolerance.
- Master-Slave: The master serves reads and writes, replicating to slaves which only serve reads.
- Master-Master: Both masters serve reads and writes and coordinate with each other. This requires a conflict resolution strategy.
Additional Topics
- Rate Limiting: Limiting the number of requests a client can make to prevent system overload.
- DNS (Domain Name System): Translates human-readable domain names into IP addresses.
- Monitoring: Observing and recording system behavior to ensure optimal performance and detect issues.