Capacity Planning: The Engineering of Predicting Before Fighting Fires

If the first time you think about capacity is during an incident, it's already too late.

Capacity planning and system sizing

"We need more servers." That sentence, spoken urgently at 2 AM while a service goes down from saturation, is the symptom of a problem that should have been solved weeks earlier. Capacity planning isn't reacting when infrastructure collapses. It's the discipline of understanding how much capacity you have, how much you'll need, and making decisions before demand exceeds supply.

Most teams don't do capacity planning. They provision by gut feeling, scale by panic, and discover infrastructure limits when users are already suffering.

Capacity planning is not "add more servers when it gets slow"

Reactive scaling is expensive, slow, and risky. When you're already in a saturation incident, provisioning new resources takes time: minutes if you have autoscaling well configured, hours if you need budget approval and manual machine setup. During that window, users experience degradation or total outages.

Capacity planning is a continuous process with three phases: measure current capacity, project future demand, and decide when and how to scale. All three require data, not intuition.

The metrics that matter (and why averages lie)

The four fundamental resources are CPU, memory, disk, and network. But how you measure them determines whether your capacity planning is useful or not.

Average CPU at 45% looks healthy. But if p95 is at 92% and p99 hits 100%, you have a problem the average hides. High percentiles reveal the reality that averages conceal. For capacity planning, always use p95 and p99, never averages.

Prediction models: three approaches that work

Predicting future demand doesn't require sophisticated machine learning models. It requires discipline and historical data.

In practice, all three complement each other. Extrapolation for baseline, events for peaks, and load testing to validate that your projections hold under real pressure.

Vertical vs horizontal: when to scale in each direction

Not all scaling is equal, and choosing the wrong direction has consequences.

Scaling vertically (more CPU, more RAM, faster disk) is the first option for simplicity. No architecture changes required. A database that needs more memory for its working set benefits directly from a vertical upgrade. But it has a ceiling: eventually you hit the limits of available hardware, and each instance jump is disproportionately more expensive.

Scaling horizontally (more instances behind a load balancer) scales better long-term, but requires your application to support it: distributed state or statelessness, load distribution, eventual consistency in many cases. If your service was designed as a stateful monolith with in-memory state, scaling horizontally isn't simply "add more nodes."

The practical rule: vertical first, horizontal when vertical isn't enough. But always design with the assumption that you'll eventually need horizontal.

In practice: the quarterly process

Capacity planning isn't a one-time project. It's a recurring process. Here's how I implement it:

The anti-pattern: over-provisioning out of fear

The opposite extreme of under-provisioning is equally problematic. I've seen organizations provision 10x the needed capacity "just in case." The result: six-figure monthly cloud bills with average utilization at 8%.

Over-provisioning comes from fear, not data. It's the infrastructure version of "throw money at the problem." It's not capacity planning — it's the absence of planning with an infinite budget.

Real capacity planning seeks balance: enough headroom to absorb spikes without wasting resources on idle capacity. That balance requires data, not gut feelings.

The cost of not planning

Lack of capacity planning doesn't manifest as an error in a log. It shows up as latency that creeps up gradually, incidents that always happen during the same peak hours, teams living in firefighting mode instead of building features.

Every saturation incident that could have been prevented is wasted engineering time, lost users, and eroded trust. And unlike a code bug, saturation isn't fixed with a 5-minute hotfix.

Capacity planning is the difference between scaling with control and scaling with panic. Measure, project, decide. Before the system forces your hand.

Jorel del Portal

Jorel del Portal

Systems engineer specialized in enterprise software architecture and high availability platforms.