In distributed systems, latency is a critical factor that can make the difference between a smooth user experience and a frustrating one. But latency isn't just a number: it's a set of variables that add up at every hop of a request.
What is latency, really?
Latency is the time elapsed from when a request is sent until the response is received. But in a distributed system, that time includes:
- Network time: the data traveling between services.
- Processing time: what each service takes to execute its logic.
- Queue time: the request waiting to be processed.
- Serialization: converting data for transmission.
Averages lie
Looking only at average latency is a classic mistake. The P95 and P99 percentiles are what really reveal the experience of your worst-affected users. A system might have 50ms average but P99 at 2 seconds.
Where latency usually hides
- Service-to-service calls without timeouts.
- Queries to unoptimized databases.
- TLS connections not reused.
- Logs synchronized to disk on each request.
- DNS lookups on each connection.
How to measure correctly
Measuring latency requires distributed instrumentation. Tools like OpenTelemetry, Jaeger or Zipkin let you see the complete trace of a request and identify exactly where time accumulates.
Latency is not a problem to solve once: it's a metric to observe constantly. The systems that best serve their users are those that treat latency as a first-class citizen.