A Guide to Databases in System Design

An overview of database concepts, types, and scaling strategies for building robust and scalable applications.

Fundamental Database Concepts
- ACID
- BASE
Relational Databases (SQL)
- Scaling SQL Databases
- SQL Tuning
NoSQL Databases
SQL vs. NoSQL: A Comparison
Consistent Hashing
Summary

Fundamental Database Concepts

ACID

ACID is a set of properties of database transactions intended to guarantee validity even in the event of errors, power failures, etc. It is commonly associated with relational (SQL) databases.

Atomicity: Each transaction is an “all or nothing” operation.
Consistency: Any transaction will bring the database from one valid state to another.
Isolation: Executing transactions concurrently has the same result as if they were executed serially.
Durability: Once a transaction has been committed, it will remain so.

BASE

BASE is a set of properties often associated with NoSQL databases, prioritizing availability over consistency.

Basically Available: The system guarantees availability.
Soft State: The state of the system may change over time, even without input.
Eventual Consistency: The system will become consistent over time, provided it doesn’t receive further input.

Relational Databases (SQL)

SQL databases use a structured schema to store data in tables with rows and columns. They are known for their reliability and support for complex queries and transactions.

Scaling SQL Databases

Replication: Creating copies of a database to improve read performance and availability.
- Master-Slave Replication: One master database handles all writes, which are then replicated to one or more read-only slave databases.
Federation: Splitting databases by function (e.g., a user database, a product database). This reduces the traffic to each database but makes joining data more complex.
Sharding: Distributing data across multiple databases (shards), where each database manages a subset of the data. This improves read and write performance but adds complexity to the application logic.
Denormalization: Intentionally adding redundant data to one or more tables to improve read performance by avoiding expensive joins.

SQL Tuning

Techniques to improve the performance of a SQL database:

Optimize Schema: Ensure data types are appropriate and tables are well-structured.
Use Indexes: Create indexes on frequently queried columns to speed up lookups.
Avoid Expensive Joins: Restructure queries or use denormalization to minimize complex joins.
Partition Tables: Split large tables into smaller, more manageable pieces.
Tune the Query Cache: Configure the query cache to store frequently executed queries.

NoSQL Databases

NoSQL databases provide a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. They are known for their flexibility, scalability, and performance.

Key-Value Stores

What it is: A simple database that uses a key to access a value. Reads and writes are typically O(1).
Use Cases: Caching, session management, real-time data.
Examples: Redis, Memcached.

Document Stores

What it is: A database that stores data in documents (e.g., JSON, BSON, XML).
Use Cases: Content management, user profiles, applications with flexible schema requirements.
Examples: MongoDB, CouchDB, Elasticsearch.

Wide-Column Stores

What it is: A database that stores data in columns rather than rows. It is optimized for queries over large datasets.
Use Cases: Big data applications, analytics, time-series data.
Examples: Apache Cassandra, Google Bigtable, HBase.

Graph Databases

What it is: A database that uses graph structures (nodes and edges) to represent and store data.
Use Cases: Social networks, recommendation engines, fraud detection.
Examples: Neo4j, FlockDB.

SQL vs. NoSQL: A Comparison

Feature	SQL	NoSQL
Data Model	Structured (tables, rows, columns)	Semi-structured or unstructured
Schema	Strict and predefined	Dynamic or flexible
Scalability	Typically vertical (scaling up)	Typically horizontal (scaling out)
Consistency	Strong (ACID)	Often eventual (BASE)
Joins	Supported and optimized	Generally not supported; done in application code
Best for	Complex queries, transactions, data integrity	Large datasets, high throughput, flexible data models

Consistent Hashing

Consistent hashing is a technique used in distributed systems to map data to nodes in a way that minimizes data movement when nodes are added or removed. It is crucial for:

Scalability: Makes scaling up and down more predictable.
Availability: Enables replication and reduces the impact of node failures.
Load Balancing: Helps to avoid hot spots by distributing data more evenly.

Summary

Choosing the right database is a critical decision in system design. Relational (SQL) databases offer strong consistency and are ideal for applications with structured data and complex transactions. NoSQL databases provide flexibility and scalability, making them well-suited for large-scale applications with evolving data requirements. Understanding the trade-offs between different database types and scaling strategies is essential for building systems that are both performant and maintainable.