Service Discovery
Scope
The mechanisms by which services dynamically find the network locations (IP addresses and ports) of each other in a distributed system.
Why This Topic Exists
In modern, elastic environments, service instances are ephemeral—they are created, destroyed, and moved constantly. Hardcoding network locations is brittle and unscalable. Service discovery provides the automation required for services to locate and communicate with each other in this dynamic landscape.
Core Tradeoffs
- Client-Side vs. Server-Side Discovery: The flexibility and control of having the client query the registry directly, versus the simplicity of offloading discovery logic to a centralized load balancer or router.
- Consistency vs. Availability (of the Registry): Does the service registry prioritize providing the most accurate, up-to-date view of service locations (Consistency), or does it prioritize being always available for queries, even if the data might be slightly stale (Availability)? This is a classic CAP theorem tradeoff.
- Intrusiveness: How much does the service discovery mechanism invade the application’s code? Client-side discovery often requires specific libraries, while server-side discovery can be more transparent to the application.
Common Failure Modes
- Stale Cache/Registry Data: A client or load balancer routes a request to a service instance that has already terminated but has not yet been deregistered, resulting in a failed request.
- Service Registry as a Single Point of Failure: If the service registry is down, new services cannot register, and existing services cannot find each other, leading to cascading failures across the system.
- Incorrectly Configured Health Checks: A health check only verifies that a service’s process is running, but the service is actually in a non-functional state (e.g., unable to connect to its database). It continues to receive traffic it cannot properly handle.
- Split-Brain Scenarios: In a network partition, different parts of the system may have different views of which services are available, leading to inconsistent behavior.
Interview Signals
A strong candidate can clearly articulate the problem that service discovery solves and compare the client-side and server-side patterns. They should be able to discuss the critical role of the service registry, the importance of robust health checking, and the tradeoffs between popular tools like Consul, etcd, and ZooKeeper.
Related Topics
- Load Balancing
- Communication
- Reliability
- DNS
- Service Mesh