Skip to content

Performance and Latency

Scope

The principles and practices for measuring, analyzing, and improving the speed, efficiency, and responsiveness of software systems.

Why This Topic Exists

Performance is a critical feature that directly impacts user experience, engagement, and retention. A slow system is often perceived as a broken system. Optimizing for performance is not a one-time task but a continuous process of measurement, analysis, and improvement.

Core Tradeoffs

  • Latency vs. Throughput: Optimizing for the lowest possible response time for a single request (latency) versus optimizing for the maximum number of requests the system can handle in a given time period (throughput).
  • Performance vs. Consistency: Strong consistency often requires coordination between nodes, which adds network round-trips and increases latency.
  • Performance vs. Cost: Achieving higher performance often requires more powerful hardware, more complex caching strategies, or more engineering effort, all of which increase cost.
  • Average vs. Tail Latency: Optimizing for the average user’s experience versus ensuring that even the slowest, worst-case requests (e.g., p99, p99.9) are acceptably fast.
  • Connection Pool Exhaustion: A slow downstream service causes a service to run out of available connections in its pool, leading to cascading failures where new requests cannot be served.
  • Garbage Collection (GC) Pauses: In managed languages, long or frequent “stop-the-world” garbage collection pauses can cause sudden, significant spikes in latency.
  • Thundering Herd Problem: A large number of processes or threads simultaneously waking up to handle an event (e.g., a cache expiry), causing a massive, self-inflicted spike in load on a downstream resource.
  • Unbounded Concurrency: Allowing an unlimited number of concurrent requests to a downstream service, which can overload it and cause increased latency for all requests.

Interview Signals

Strong candidates talk about performance not as a vague goal but as a measurable metric. They should be able to distinguish clearly between latency and throughput, and discuss the importance of tail latency (p95, p99). They should also be able to describe a systematic approach to performance analysis, including profiling, benchmarking, and identifying bottlenecks in a distributed system.

  • Scalability
  • Caching
  • Databases
  • Reliability
  • Observability