How we reduced API latency by 60% with strategic Redis caching

A deep dive into the caching patterns we applied to a high-traffic B2B SaaS — including what we got wrong the first time.

The problem: endpoints grinding to a halt

Our client ran a B2B SaaS platform serving ~4,000 daily active users. Every dashboard load triggered 12–15 database queries, many of them aggregations on tables with millions of rows. At peak hours, p99 latency on the main dashboard endpoint hit 3.8 seconds — unacceptable for a productivity tool.

The stack was Node.js (Express) + PostgreSQL + Redis already present but barely used. Our job: fix latency without a full rewrite.

What we tried first (and got wrong)

Our instinct was to cache the entire dashboard response per user. We set a 5-minute TTL and called it done. Latency dropped to 400 ms. The client was happy. Then we got a bug report: users were seeing stale data — invoices that had been paid still showed as outstanding.

The problem with whole-response caching is that a single cache entry covers data with wildly different update frequencies. Invoice statuses change every few minutes; company settings change once a month. Invalidating the whole cache on any write meant cache-hit rates collapsed under normal usage.

The pattern that actually worked: layered TTLs

We split the dashboard data into three tiers:

Hot tier (TTL: 30 s): Counters and aggregates that change frequently — open tickets, pending approvals, recent activity feed. Cache key: user:{id}:hot
Warm tier (TTL: 5 min): Relational data that changes on user action — assigned projects, team members, billing status. Invalidated explicitly on write via a simple pub/sub.
Cold tier (TTL: 1 h): Configuration, feature flags, organisation metadata. Only invalidated on explicit admin action.

Each tier is fetched in parallel. The dashboard assembles from three independent cache reads rather than one monolithic query.

Explicit invalidation beats TTL expiry for write-heavy data

For the warm tier, we wired cache invalidation directly into the service layer. Whenever a project is updated, the service calls redis.del(`user:{assigneeId}:warm`) before returning the response. This adds roughly 2 ms to writes but keeps read data accurate.

We used a lightweight pub/sub channel for cross-service invalidation — when the billing service marks an invoice paid, it publishes an event; the dashboard service subscribes and purges the affected warm key.

Results

After rolling out layered caching with explicit invalidation on the warm tier, p99 dashboard latency fell from 3.8 s to 1.5 s — a 60% reduction — with cache hit rates staying above 85% throughout the day. Stale-data reports dropped to zero in the following two weeks.

Database CPU at peak hours went from 78% to 31%. That headroom let us defer a planned hardware upgrade by at least six months.

Key takeaways

Don't cache whole responses. Identify the TTL of each data slice independently.
Explicit invalidation on writes beats TTL for anything with a human update cycle.
Measure cache hit rates, not just latency — a cache with a 30% hit rate is often making things worse.
Redis pub/sub for cross-service invalidation is simple and reliable at this scale.