Source · Foundations of Scalable Systems (O'Reilly)
Why this matters
Foundations of Scalable Systems, Ch. 1Every distributed system begins as a single process that works fine — until traffic grows. Scalability is the property that lets a system absorb that growth by adding resources, ideally without a rewrite.
Understanding the levers of scale early is what separates a design that survives its first success from one that collapses under it. The vocabulary here — statelessness, caching, load balancing — is the shared language of every senior systems conversation.
The concept
Foundations of Scalable Systems, Ch. 2–3Scalability is the ability to handle increased load by adding resources. It is distinct from performance: performance is how fast a single request completes (latency); scalability is how load capacity grows as you add hardware. A system can be fast yet unscalable, or scalable yet slow.
There are two axes. Vertical scaling (scaling up) means a bigger machine — more CPU, RAM. It is simple but bounded by the largest box money can buy and offers no redundancy. Horizontal scaling (scaling out) means more machines behind a load balancer; it is effectively unbounded and fault-tolerant, but demands that servers be stateless so any node can serve any request. State must then live in shared stores or client tokens. Caching cuts load and latency by holding hot data close to the request, and CAP intuition reminds us that under a network partition you must trade Consistency against Availability.
Worked scenario
Foundations of Scalable Systems, Ch. 4A photo app runs on one server that stores sessions in local memory. At 10x traffic it stalls. The team scales up to a bigger box — buying six months, but the single point of failure remains.
The durable fix is horizontal: run five identical stateless app servers behind a load balancer, move session state into a shared Redis cache, and put a CDN cache in front of images. Now capacity grows by adding nodes, a dead node just drops out of rotation, and the cache absorbs most reads. The bottleneck migrates to the database — the next thing to scale.
How it connects
Foundations of Scalable Systems, Ch. 5These primitives underpin every later topic. Event-driven architecture is how stateless services stay loosely coupled; serverless is horizontal scaling taken to its automatic extreme; data mesh applies ownership thinking to the database bottleneck this topic exposes.
Once you can reason about up-vs-out, stateful-vs-stateless, and the CAP trade, the rest of distributed systems becomes variations on a theme.
- Treating scalability and performance as the same thing — a low-latency service can still fail to scale.
- Assuming vertical scaling is 'good enough' — it has a hard ceiling and no built-in redundancy.
- Forgetting that horizontal scaling requires statelessness; sticky in-memory sessions silently break it.
- Scalability adds capacity by adding resources; performance is single-request speed — they are independent.
- Scale out (horizontal + stateless) beats scale up (vertical) for unbounded growth and fault tolerance.
- Caching and load balancing are the workhorse levers; CAP forces a consistency/availability choice under partition.