Latency vs. Throughput

One-Liner

The distinction between the time it takes for a single operation to complete (latency) and the number of operations completed per unit of time (throughput).

What It Is

Latency: The delay between a cause and effect, often measured as the time from sending a request to receiving a response. Lower latency means faster responses.
Throughput: The rate at which operations are completed, typically measured as operations per second, requests per minute, or bytes per second. Higher throughput means more work can be done.

Why It Exists

To provide two distinct, often conflicting, metrics for evaluating system performance. Optimizing for one often comes at the expense of the other.

How It Works

Latency: Affected by network speed, server processing time, and disk I/O.
Throughput: Affected by concurrency, parallelization, and resource availability (CPU, memory, network bandwidth).

Tradeoffs

Optimizing for Latency

Pros: Faster responses for individual operations.
Cons: Can sometimes reduce throughput (e.g., by limiting concurrency).

Optimizing for Throughput

Pros: More work completed per unit of time.
Cons: Can sometimes increase latency (e.g., by batching requests).

Failure Modes

A system with low latency but low throughput might quickly become overwhelmed by a small increase in load.
A system with high throughput but high latency might process many requests, but each user experiences a long wait.

Interview Traps

Confusing the two terms or using them interchangeably.
Not being able to explain when to prioritize one over the other.

Real-World Usage

Latency-critical systems: High-frequency trading, real-time gaming.
Throughput-critical systems: Batch processing, data analytics.

Anti-Patterns

Measuring only one of the two metrics and assuming it tells the whole story about performance.

Performance Tuning
Scalability
Queuing Theory