Latency vs. Throughput
One-Liner
The distinction between the time it takes for a single operation to complete (latency) and the number of operations completed per unit of time (throughput).
What It Is
- Latency: The delay between a cause and effect, often measured as the time from sending a request to receiving a response. Lower latency means faster responses.
- Throughput: The rate at which operations are completed, typically measured as operations per second, requests per minute, or bytes per second. Higher throughput means more work can be done.
Why It Exists
To provide two distinct, often conflicting, metrics for evaluating system performance. Optimizing for one often comes at the expense of the other.
How It Works
- Latency: Affected by network speed, server processing time, and disk I/O.
- Throughput: Affected by concurrency, parallelization, and resource availability (CPU, memory, network bandwidth).
Tradeoffs
Optimizing for Latency
- Pros: Faster responses for individual operations.
- Cons: Can sometimes reduce throughput (e.g., by limiting concurrency).
Optimizing for Throughput
- Pros: More work completed per unit of time.
- Cons: Can sometimes increase latency (e.g., by batching requests).
Failure Modes
- A system with low latency but low throughput might quickly become overwhelmed by a small increase in load.
- A system with high throughput but high latency might process many requests, but each user experiences a long wait.
Interview Traps
- Confusing the two terms or using them interchangeably.
- Not being able to explain when to prioritize one over the other.
Real-World Usage
- Latency-critical systems: High-frequency trading, real-time gaming.
- Throughput-critical systems: Batch processing, data analytics.
Anti-Patterns
- Measuring only one of the two metrics and assuming it tells the whole story about performance.
Related Concepts
- Performance Tuning
- Scalability
- Queuing Theory