Distributed Systems

A guide to the principles, patterns, and trade-offs involved in building reliable, scalable, and maintainable distributed systems.

Core Principles of Data Systems
Data Models and Schema Design
- Relational vs. Document vs. Graph Models
- Graph Data Models
Storage and Retrieval
Data Encoding and Evolution
Data Distribution: Replication and Partitioning
- Replication
- Partitioning
Transactions
- ACID Properties
- Isolation Levels and Concurrency Issues
Data Flow Architectures
Handling Failures

Core Principles of Data Systems

Data systems serve various purposes, including data storage, caching, search, messaging, and batch processing. The primary goal is to build systems that are reliable, scalable, and maintainable.

Reliability

Reliability is the ability of a system to operate correctly, even when things go wrong. A reliable system must handle:

Hardware Faults: Server failures, network outages.
Software Errors: Bugs and unexpected behaviors.
Human Errors: Misconfigurations and faulty deployments.

Note: Systems designed to anticipate and manage faults are known as fault-tolerant or resilient.

Scalability

Scalability is a system’s ability to handle increased load.

Vertical Scaling (Scaling Up): Increasing the resources (CPU, RAM) of a single server.
Horizontal Scaling (Scaling Out): Adding more servers to distribute the load.

Maintainability

Maintainability ensures a system is easy to operate, understand, and evolve over time.

Operability: Ease of monitoring, troubleshooting, and management.
Simplicity: Reducing complexity to minimize errors and improve understanding.
Evolvability: Designing for future changes without extensive refactoring.

Data Models and Schema Design

Relational vs. Document vs. Graph Models

Relational Model: Suitable for many-to-many relationships, with a structured schema.
Document Model: Ideal for one-to-many relationships, offering schema flexibility.
Graph Model: Efficient for complex, interconnected data with many-to-many relationships (e.g., social networks).

Graph Data Models

Property Graph Model: Consists of vertices (nodes) and edges (relationships), both of which can have properties.
Triple-Store Model: Stores data in (subject, predicate, object) triples.

Storage and Retrieval

Storage Engines: OLTP vs. OLAP

OLTP (Online Transaction Processing): Optimized for frequent, low-latency reads and writes. Typically used for user-facing applications.
OLAP (Online Analytical Processing): Optimized for complex, aggregate queries over large datasets. Used for business intelligence and analytics.

Data Warehousing

A data warehouse is a separate database optimized for analytical (OLAP) queries, preventing them from impacting the performance of transactional (OLTP) databases. Common schemas include star and snowflake schemas.

Column-Oriented Storage: Storing data column by column (rather than row by row) is highly efficient for analytical workloads.

Indexing Structures

B-Trees: Commonly used in relational databases for efficient key-value lookups.
SSTables (String Sorted Tables): Stores data sorted by key, efficient for reads.
Bloom Filters: A memory-efficient way to check for the existence of a key.
Other types: Multi-column indexes, full-text search indexes.

Data Encoding and Evolution

Data exists in two primary forms:

In-Memory Representation: Optimized for CPU access (e.g., objects, arrays, hash tables).
Storage/Network Encoding: Serialized into a byte sequence for storage or transmission.

Data Distribution: Replication and Partitioning

Data is distributed to achieve scalability, fault tolerance, and low latency.

Replication

Replication involves storing copies of the same data on multiple machines (replicas).

Benefits: High availability, fault tolerance, and read scalability.
Replication Methods:
- Single-Leader: One leader handles all writes, which are then propagated to followers.
- Multi-Leader: Multiple leaders accept writes, which is useful for multi-datacenter deployments but requires conflict resolution.
- Leaderless: No single leader; nodes coordinate to handle writes.
Replication Logs: Used to propagate changes (e.g., statement-based, Write-Ahead Log (WAL), logical logs).

Handling Node Outages in Leader-Based Replication

Follower Failure: A failed follower can recover by catching up on changes from the leader.
Leader Failure (Failover): If the leader fails, a follower is promoted to be the new leader, either manually or automatically.

Partitioning

Partitioning (or sharding) divides a dataset into smaller parts, with each partition stored on a different node.

Partitioning Strategies:
- By Key Range: Can lead to hot spots if access patterns are uneven.
- By Hash of Key: Distributes data more evenly.
- Consistent Hashing: Minimizes data movement when nodes are added or removed.
Rebalancing: The process of moving data and request load between nodes to ensure even distribution.
Service Discovery: Helps clients find the correct node for their request.

Transactions

Transactions group multiple read and write operations into a single logical unit that either succeeds or fails entirely.

ACID Properties

Atomicity: All or nothing.
Consistency: The database remains in a valid state.
Isolation: Concurrent transactions do not interfere with each other.
Durability: Committed changes are permanent.

Isolation Levels and Concurrency Issues

Isolation Levels: Control the degree to which concurrent transactions can interfere with each other (e.g., Read Committed).
Common Concurrency Issues:
- Lost Update: Changes from one transaction are overwritten by another.
- Write Skew: Two transactions read the same data and then update different parts of it, leading to an inconsistent state.
Serializability: The highest isolation level, ensuring that concurrent transactions have the same effect as if they were run in some serial order. Achieved through techniques like Two-Phase Locking (2PL) or Serializable Snapshot Isolation (SSI).

Data Flow Architectures

Dataflow Through Databases

In a traditional database architecture, data is mutable and can be updated at any time. This requires careful management of concurrency and consistency.

Dataflow Through Services (SOA)

In a Service-Oriented Architecture (SOA), the application is composed of loosely coupled, independently deployable services that communicate over a network. This is the foundation of microservices.

Message-Passing Dataflow

Data is sent between components via messages, often using a message broker (e.g., RabbitMQ, Kafka).

Benefits:
- Decoupling: Sender and receiver are logically separated.
- Buffering: Smooths out message delivery during traffic spikes.
- Redelivery: Can resend messages that fail to be delivered.

Handling Failures

Partial Failure: A key challenge in distributed systems is that some parts of the system can fail while others continue to work. The goal is to build a reliable system from unreliable components.
Fault Tolerance: The ability of a system to continue operating in the event of a failure.