System Design

A guide to the principles, patterns, and components of building scalable, reliable, and efficient software systems.

Core Concepts
Fundamental Trade-offs
Architectural Patterns
Core Components
Design Patterns
Additional Topics
Resources

Core Concepts

Scalability

Scalability is a system’s ability to handle a growing amount of work or traffic.

Vertical Scaling (Scaling Up): Adding more resources (e.g., CPU, RAM) to a single server. Simpler to implement but has physical limits.
Horizontal Scaling (Scaling Out): Adding more servers to distribute the load. Offers greater scalability but increases complexity. Requires stateless servers for effective implementation, with session management handled by a centralized store like Redis or Memcached.

Performance

Performance involves optimizing response times, data throughput, and processing speeds.

CPU Bound: Processing time is limited by CPU power.
I/O Bound: Processing time is limited by input/output operations (e.g., disk or network access).

Availability

Availability is the degree to which a system remains operational and accessible, often measured in “nines.”

Availability %	Downtime per year
90% (one nine)	36.53 days
99% (two nines)	3.65 days
99.9% (three nines)	8.77 hours
99.99% (four nines)	52.60 minutes
99.999% (five nines)	5.26 minutes

In Sequence: Availability (Total) = Availability (Foo) * Availability (Bar)
In Parallel: Availability (Total) = 1 - (1 - Availability (Foo)) * (1 - Availability (Bar))

Reliability

Reliability ensures a system operates consistently and correctly, even under stress or in the presence of failures. It often involves:

Redundancy: Duplicating critical components.
Failover: Automatically switching to a standby system in case of failure.
- Active-Passive: A passive node takes over when the active node fails.
- Active-Active: Multiple active servers share the load.

Consistency

Consistency ensures that all users see the same data at the same time.

Strong Consistency: Every read receives the most recent write or an error. Data is replicated synchronously.
- Use Cases: Financial systems, file systems.
Eventual Consistency: Reads will eventually see the latest write. Data is replicated asynchronously.
- Use Cases: DNS, email, social media feeds.
Weak Consistency: After a write, a read may or may not see it.
- Use Cases: VoIP, video chat, real-time games.

Maintainability

Maintainability ensures a system is easy to understand, modify, and scale over time. Key practices include:

Clean, modular code
Comprehensive documentation
Automated testing and deployment
Code reviews

Security

Security encompasses measures to protect a system from unauthorized access and attacks, including:

Data encryption
Access control and authentication
Infrastructure integrity

Fundamental Trade-offs

CAP Theorem

In a distributed system, you can only choose two of the following three guarantees at any given time:

Consistency: Every read receives the most recent write or an error.
Availability: Every request receives a non-error response, without the guarantee that it contains the most recent write.
Partition Tolerance: The system continues to operate despite network failures that split the system into multiple partitions.

Note: In modern distributed systems, partition tolerance is a necessity. Therefore, the trade-off is almost always between Consistency and Availability.

CP (Consistency & Partition Tolerance): The system prioritizes consistency over availability. If a partition occurs, the system may return an error or timeout to avoid returning stale data.
- Choose if: The business requires atomic reads and writes.
AP (Availability & Partition Tolerance): The system prioritizes availability over consistency. It returns the most readily available version of the data, which might not be the latest.
- Choose if: The business can tolerate eventual consistency and requires the system to remain operational despite external errors.

Latency vs. Throughput

Latency: The time it takes to perform an action or produce a result. (e.g., how long a user waits for a page to load).
Throughput: The number of actions or results per unit of time. (e.g., requests per second).

Goal: Strive for maximal throughput with acceptable latency.

Performance vs. Scalability

Performance Problem: The system is slow for a single user.
Scalability Problem: The system is fast for a single user but slow under heavy load.

A service is scalable if adding resources results in a proportional increase in performance.

Architectural Patterns

Monolithic Architecture: A single, unified codebase where all components are interdependent. Easier to manage at a small scale but lacks flexibility and scalability.
Microservices Architecture: An application is broken down into a collection of loosely coupled, independent services. This enables independent development, scaling, and deployment.
Event-Driven Architecture: A model where services or functions respond to events asynchronously. Ideal for applications requiring real-time updates or complex workflows.

Core Components

Databases

Databases store, manage, and retrieve data.

RDBMS (Relational): Use structured schemas and are optimized for consistency (e.g., SQL).
NoSQL (Non-Relational): Optimized for flexibility, scalability, and unstructured data.

Caches

Caches provide fast, temporary storage for frequently accessed data, reducing load on primary data stores and speeding up response times.

Load Balancers

Load balancers distribute incoming requests across multiple servers to prevent any single server from becoming a bottleneck, improving reliability and scalability.

Message Queues

Message queues enable asynchronous communication, decoupling services and allowing them to communicate without direct interaction. They are useful for handling high volumes of data and managing background tasks.

Content Delivery Networks (CDNs)

CDNs are distributed networks of servers that cache content closer to end-users, reducing latency and improving load times for static assets.

Design Patterns

Load Balancing: Distributes workload across servers to enhance availability and prevent bottlenecks.
Caching: Stores frequently accessed data in memory to speed up retrieval and reduce database load.
Sharding: Splits a database into smaller, more manageable pieces (shards) to improve scalability and performance.
Circuit Breaker: Protects services from cascading failures by halting requests to a service that is unresponsive.
Replication: Creating copies of data or components to improve availability and fault tolerance.
- Master-Slave: The master serves reads and writes, replicating to slaves which only serve reads.
- Master-Master: Both masters serve reads and writes and coordinate with each other. This requires a conflict resolution strategy.

Additional Topics

Rate Limiting: Limiting the number of requests a client can make to prevent system overload.
DNS (Domain Name System): Translates human-readable domain names into IP addresses.
Monitoring: Observing and recording system behavior to ensure optimal performance and detect issues.