Vertical Scaling (Scale Up)

Vertical scaling means upgrading your existing server to a more powerful machine. Instead of 4 CPU cores and 16GB RAM, you move to 32 cores and 256GB RAM. Your application code does not change — it just runs on bigger hardware.

Vertical scaling — same server, more power Before: 1 server × [4 CPU, 16GB RAM, 500GB SSD] After: 1 server × [32 CPU, 256GB RAM, 2TB NVMe] Traffic handled: increases proportionally with hardware Deployment change: none — same app, bigger box

Advantages: Simple (no code changes, no load balancer), works for stateful applications, no distributed systems complexity.

Limitations: There is a ceiling — the biggest VM available (e.g. AWS u-24tb1.metal has 448 vCPUs but is very expensive). Requires downtime to resize. No redundancy — one server failing = full outage.

Horizontal Scaling (Scale Out)

Horizontal scaling means adding more servers and distributing traffic across all of them with a load balancer. You go from 1 server to 5, then to 50 or 500 as demand grows. Each server runs the same application code.

Horizontal scaling — many servers behind a load balancer [ Load Balancer ] / | \ [Server 1] [Server 2] [Server 3] App v2 App v2 App v2 | [ Shared Resources ] - Redis (sessions) - PostgreSQL (database) - S3 (file storage)

Advantages: No hard ceiling — add as many servers as needed; high availability (if one server fails, others continue); can scale during traffic spikes without downtime; cost-efficient (add/remove servers as needed).

Requirements: Stateless application design; shared storage for sessions, files; a load balancer; distributed caching.

Comparison Table

PropertyVertical ScalingHorizontal Scaling
Also calledScale upScale out
How it worksBigger serverMore servers
CeilingLargest available VMVirtually unlimited
Downtime neededUsually yesNo (rolling deploys)
High availabilityNo (single point of failure)Yes (N-1 redundancy)
Code changesNoneRequires stateless design
CostExpensive at high endCheaper at scale
ComplexityLowHigher (load balancer, shared state)
Best forDatabases, stateful services, early stageWeb servers, APIs, microservices

Designing for Statelessness

Horizontal scaling requires that any server instance can handle any request. This means no server-local state. The most common things to move out of the server:

  • Sessions: Store in Redis (not PHP/Node.js memory). Redis is shared across all instances.
  • File uploads: Store in object storage (S3, Google Cloud Storage) not the local filesystem
  • In-memory caches: Use Redis or Memcached instead of application-level in-memory maps
  • WebSocket connections: Use Redis pub/sub for cross-instance message broadcasting

Twelve-Factor App — Design for Scale

The Twelve-Factor App methodology (12factor.net) defines best practices for building scalable web applications. Key relevant factors: Config in environment variables (not hardcoded), Stateless processes (no local state), Treat backing services as attached resources (database, cache via URLs). Following these makes horizontal scaling straightforward.

Database Scaling Strategies

The database is usually the hardest component to scale horizontally. Here are strategies in order of complexity:

StrategyComplexityBenefitWhen to Use
Vertical scale DBLowMore resources for queriesFirst step always
Read replicasLow-MediumReads scale linearlyRead-heavy workloads
Redis cachingMedium90%+ of reads from cacheRepetitive queries
Connection poolingLowReduces DB connection overheadMany app instances
ShardingVery HighWrites scale horizontallyBillions of rows, extreme write scale
NoSQL (Cassandra)HighNative horizontal write scaleSpecific use cases only

Load Balancing Algorithms

Load balancers use different algorithms to distribute traffic:

  • Round Robin: Request 1 → Server 1, Request 2 → Server 2, Request 3 → Server 3, Request 4 → Server 1... Simple and works well when servers have equal capacity
  • Least Connections: Send to the server with the fewest active connections. Better when requests have varying processing time
  • IP Hash: Same client IP always goes to same server. Enables session affinity for stateful apps — but limits true load balancing
  • Weighted: More powerful servers get more traffic. Useful during gradual upgrades

Auto-Scaling in Cloud Environments

Cloud platforms enable automatic scaling based on metrics:

AWS Auto Scaling Group — conceptual policy Scale OUT (add instances) when: - CPU utilisation > 70% for 3 consecutive minutes - Request count > 10,000 per minute Scale IN (remove instances) when: - CPU utilisation < 30% for 10 consecutive minutes - Cooldown period: 300 seconds after last scale event Min instances: 2 (for HA) Max instances: 50

Cold Start Problem with Auto-Scaling

Auto-scaling works best when new instances start quickly. If your application takes 5 minutes to boot, a traffic spike will overwhelm existing servers before new ones come online. Optimise startup time — use pre-warmed images, keep application startup under 30 seconds. Container-based deployments (Docker/ECS/Kubernetes) typically start in seconds.

Real-World Scaling Examples

CompanyScale ChallengeSolution
Instagram (2012)Rapid user growth, PostgreSQL bottlenecksPostgreSQL read replicas + pgBouncer + Redis caching
TwitterTweet fan-out at massive scalePre-computed timelines in Redis, sharded MySQL
DropboxFile storage at petabyte scaleMoved from S3 to own distributed storage (Magic Pocket)
AirbnbSearch at global scaleElasticsearch for search, horizontal API tier, RDS replicas
Stack OverflowMillions of daily usersVertical scaling + aggressive caching — just a few servers

How We Research and Update This Guide

We test the underlying formula or workflow, compare outputs with reliable references, and revise examples whenever the page content changes.

  • The workflow or formula is tested directly in the tool and compared against independent reference examples.
  • Examples are kept practical so readers can verify the result without hidden assumptions.
  • Pages are revised whenever the interface, calculation flow, or surrounding guidance materially changes.

Frequently Asked Questions — Scaling Architecture