Skip to content

System Design Interview Guide

A structured framework to navigate the open-ended conversation of a system design interview.

Table of Contents


What Interviewers Are Looking For

  • Communication: How clearly you articulate your design and thought process.
  • Scalability: How you design for growth and high load.
  • Quantitative Solutions: Your ability to create concrete and measurable solutions.
  • Trade-offs: How you handle compromises and justify your decisions.

General Tips

  • Explain your thought process: Communicate your reasoning at every step.
  • Ask clarifying questions: Don’t make assumptions. Understand the problem fully.
  • Iterate and improve: Continuously look for ways to enhance your design.
  • Clarify scope: Define the boundaries of the system you are expected to design.
  • Define success: Ask, “How can we verify that the system is working properly?”
  • Identify bottlenecks: Be vigilant for potential performance issues and weak points.

The Interview Framework

A system design interview can be broken down into the following steps.

1. Requirements Gathering (5-10 Minutes)

Note: This is the most critical step. A thorough understanding of the problem sets you up for success.

Functional Requirements

These define what the system does. Think in terms of “The user should be able to…”

  • Who is going to use it?
  • How are they going to use it?
  • What are the main features?
  • What are the inputs and outputs of the system?

Non-Functional Requirements (System Qualities)

These define how the system should operate.

  • Consistency vs. Availability: Which is more important? (See CAP Theorem).
  • Scalability: How will the system handle traffic spikes? Is it read-heavy or write-heavy?
  • Latency: What is the required response time?
  • Durability: How critical is it to prevent data loss?
  • Security: What are the security requirements?
  • Fault Tolerance: How should the system behave during failures?

2. Estimation (5 Minutes)

Perform back-of-the-envelope calculations to understand the scale of the system.

  • Throughput: Requests per second (QPS) for reads and writes.
  • Traffic: How much data is coming in and out (e.g., GB/day).
  • Storage: How much data needs to be stored.
  • Memory: How much data should be in the cache.

Example: Estimating Tweet Storage

  • Daily Active Users (DAU): 250 Million
  • Tweets per user per day: 1
  • Tweet size: ~280 bytes
  • Storage per day: 250M users * 1 tweet/user * 280 bytes/tweet = 70 GB/day

3. High-Level Design (10-15 Minutes)

Create a big-picture view of your system.

  • Sketch components: Use boxes and arrows for services, databases, caches, etc.
  • Define APIs: Specify the main endpoints and their functions. This serves as a contract for your design.
  • Data Flow: Describe the sequence of actions for a user request.
  • Data Schema: A preliminary design of your database tables or data models.

4. Deep Dive (10-15 Minutes)

Flesh out the details of your core components and address the non-functional requirements.

  • Scaling Components: How will you scale each part of your system?
  • Bottlenecks: Where are the weak points? How can you address them?
  • Edge Cases: What happens during failures or unexpected inputs?
  • Justify Decisions: Explain the trade-offs of your choices.

5. Scaling the Design

Address scalability challenges and introduce advanced concepts.

  • Load Balancing: Distribute traffic across multiple servers.
  • Horizontal Scaling: Add more machines to handle the load.
  • Caching: Reduce latency and database load.
  • Database Sharding: Partition your database to handle more data.
  • Asynchronism: Use message queues for background jobs.

Key System Design Concepts

Core Components & Technologies

  • DNS: Translates domain names to IP addresses.
  • CDN: Caches static content closer to users.
  • Load Balancers: Distribute incoming traffic.
  • Reverse Proxy: A gateway for client requests.
  • Application Layer: Microservices, Service Discovery.
  • Databases:
    • RDBMS (SQL): For structured data with strong consistency (e.g., PostgreSQL, MySQL).
    • NoSQL: For unstructured data and scalability (e.g., Cassandra, MongoDB).
  • Caches:
    • Strategies: Cache aside, Write-through, Write-behind.
    • Eviction Policies: LRU, LFU, FIFO.
  • Asynchronism:
    • Message Queues: For decoupling services (e.g., RabbitMQ, Kafka).
    • Task Queues: For running background tasks (e.g., Celery).
  • Communication Protocols:
    • TCP vs. UDP: Connection-oriented vs. connectionless.
    • REST vs. RPC: Architectural style vs. protocol.
    • Real-time: Long polling, WebSockets, Server-Sent Events (SSE).

Important Heuristics

  • CAP Theorem: A distributed system can only provide two of three guarantees: Consistency, Availability, and Partition Tolerance.
  • Read/Write Ratio: Understand if your system is read-heavy or write-heavy to optimize accordingly.
  • Stateless Services: Design application servers to be stateless so they can be easily scaled.

Reference Tables

Powers of 1000

PowerNamePrefix
01
1ThousandKilo
2MillionMega
3BillionGiga
4TrillionTera
5QuadrillionPeta

Latency Comparisons

ActionTimeComparison
Read 1MB from memory0.25ms-
Read 1MB from SSD1ms4x memory
Read 1MB from disk20ms20x SSD
Round trip (California to Netherlands)150ms-

Data Sizes

ItemSize
2-hour movie~1GB
Small book~1MB
High-res photo~1MB
Medium-res photo~100KB

Resources

Foundational Guides

Blogs & Articles

  • High Scalability: Real-world examples of scalable architectures.
  • Jepsen: In-depth analysis of distributed systems and their failure modes.
  • Aphyr’s Blog: Posts on distributed systems, safety, and consistency.

Interview Prep Platforms

YouTube Channels

Key Videos & Playlists