Understanding Latency in Distributed Systems

In distributed systems, latency is a critical factor that can make the difference between a smooth user experience and a frustrating one. But latency isn't just a number: it's a set of variables that add up at every hop of a request.

What is latency, really?

Latency is the time elapsed from when a request is sent until the response is received. But in a distributed system, that time includes:

Network time: the data traveling between services.
Processing time: what each service takes to execute its logic.
Queue time: the request waiting to be processed.
Serialization: converting data for transmission.

Averages lie

Looking only at average latency is a classic mistake. The P95 and P99 percentiles are what really reveal the experience of your worst-affected users. A system might have 50ms average but P99 at 2 seconds.

Where latency usually hides

Service-to-service calls without timeouts.
Queries to unoptimized databases.
TLS connections not reused.
Logs synchronized to disk on each request.
DNS lookups on each connection.

How to measure correctly

Measuring latency requires distributed instrumentation. Tools like OpenTelemetry, Jaeger or Zipkin let you see the complete trace of a request and identify exactly where time accumulates.

Latency is not a problem to solve once: it's a metric to observe constantly. The systems that best serve their users are those that treat latency as a first-class citizen.

Jorel del Portal

Systems engineer specialized in enterprise software architecture and high availability platforms.

LinkedIn YouTube My website