Back to Blog

Latency in Distributed Apps

latency isn't theory. it's the reason half your users bounce and your logs fill up with timeouts.

Published
Updated

Why Speed Matters: The Physics of Latency in Distributed Apps

When I first built “fast” services, I thought fast meant code executes quickly.
Then I shipped to production, and the real enemy showed up—the wire.

The CPU finished in microseconds.
The request took 400 ms round-trip.
Welcome to distributed reality.


Where Latency Actually Hides

Latency isn't one monster; it's a thousand paper cuts:

  • DNS lookup: 10-120 ms if uncached. Multiply by every microservice calling another over HTTPS.
  • TCP handshake: 1 RTT, about 50-150 ms over WAN. Add TLS, make it 2 RTTs.
  • Serialization: JSON is human-friendly, CPU-hostile. Protobuf cuts 30-50 %.
  • Deserialization: same pain in reverse.
  • Thread scheduling + GC: a 20 ms GC pause under load feels like an eternity when your SLA is 100 ms.
  • Nagle's algorithm + buffering: ever wonder why small writes feel sticky? Yeah.

You add them up and suddenly your 20 ms function takes 300 ms on a good day.

Every hop adds uncertainty.
Every dependency adds risk.
And users don't care if it's DNS or garbage collection—they just know it feels slow.


Physics Doesn't Scale

Light in fiber moves roughly 200 000 km/s. That's 5 µs/km one-way.
So even with perfect code, Mumbai → New York → Mumbai burns ~200 ms.
You can't optimize the speed of light.

That's why the first rule of low-latency systems is simple:
Keep data close to whoever needs it.

  • Mirror data by region.
  • Do writes locally, replicate asynchronously.
  • Serve reads from the nearest node.
  • Accept that “eventual consistency” beats “consistently slow.”

I'd rather explain a one-second replication lag than watch a global user base time out at login.


Tracing Is Truth

If you don't trace it, you're guessing.
Distributed tracing changed how I see systems. The first time I ran Jaeger across a chain of services, it was humiliating—half the time vanished in “client wait.”

So now, whenever latency creeps in, I trace:

  1. Service entry timestamp
  2. Downstream call start + finish
  3. Payload size
  4. Network hops + retries

You learn weird things fast.
That shiny auth microservice? 120 ms SSL handshake.
That “tiny” metrics push? 8 KB JSON every request.
That fancy ORM query? Doing N+1 calls under the hood.

Traces don't lie. They just make you wish you'd instrumented sooner.


Caching Is a Weapon (and a Trap)

Caches are like steroids. They make you look strong until you forget how to function without them.
A good cache strategy cuts 70 % of latency.
A bad one doubles it when it goes cold.

Ground rules I follow:

  • Keep hot keys in-memory (Redis/memcached) near compute.
  • Version cache entries to avoid stale data storms.
  • Use request coalescing so ten misses become one fetch.
  • Monitor hit ratios like oxygen levels.

And always have a warm-up plan. A cold cache at peak traffic is like running a marathon after skipping breakfast.


Concurrency Over Chattiness

Microservices love to chat. The more services you add, the more small talks they start having.
But network calls aren't conversations; they're invoices—you pay for each.

Batch requests.
Use async pipelines.
Prefer streaming over polling.
Every extra round trip is wasted motion.

Once I replaced a chain of six dependent HTTP calls with a single aggregated gRPC call. Latency dropped from 600 ms to 90 ms.
Not because I “optimized.”
Because I stopped talking so much.


What I Actually Tune

People ask “what knobs do you tweak for performance?”
Here's the unromantic list from real incidents:

  • TCP keep-alive and connection reuse - avoid handshake overhead.
  • gRPC max message size - stop compressing tiny payloads to death.
  • Thread pool limits - too many threads cause context-switch storms.
  • JVM GC tuning - low-pause collectors like ZGC or G1.
  • Load-balancer timeout symmetry - keep client/server timeouts aligned or you'll leak sockets.
  • CDN edge caching - terminate TLS close to users.

None of this is glamorous. It's plumbing. But that plumbing decides if your app feels instant or ancient.


The Human Rule

Latency teaches humility.
It doesn't care about your frameworks, your degrees, or your cloud provider.
It only rewards those who measure, simplify, and respect distance.

The engineers who obsess over 5 ms here and 10 ms there—they're not nitpicking. They're protecting the rhythm of the whole system.

Users might never know your name, but they'll feel your work every time a page loads instantly.
That's the closest thing to applause you'll get in backend engineering.


Bottom line:
Latency isn't a number—it's your system's heartbeat.
Ignore it, and it starts skipping.
Study it, and you learn what “fast” really means.