Source · AWS SAA-C03 Exam Guide + AWS Well-Architected Framework
Why this matters
SAA-C03 Exam Guide, Domain 1The SAA-C03 exam opens with the largest scored domain — Design Resilient Architectures — and compute sits at its center. Almost every scenario question is really asking: how does this workload survive a failure without a human intervening?
Getting compute resilience right means the difference between an architecture that pages an engineer at 3am and one that self-heals. The exam rewards designs that assume things will break and recover automatically.
The concept
AWS Well-Architected — Reliability PillarEC2 gives you virtual servers, but a single instance is a single point of failure. Resilience comes from three cooperating services.
An Auto Scaling Group (ASG) maintains a desired count of healthy instances across multiple Availability Zones, replacing any that fail its health checks. An Elastic Load Balancer spreads incoming traffic across those instances and stops routing to unhealthy ones. Multiple AZs ensure a data-center-level failure takes out only part of your fleet.
Pick the right load balancer: the ~Application Load Balancer (ALB)~ operates at Layer 7 (HTTP/HTTPS, path and host routing), the ~Network Load Balancer (NLB)~ at Layer 4 (TCP/UDP, ultra-low latency, static IPs), and the classic Gateway Load Balancer fronts third-party virtual appliances.
Worked scenario
AWS EC2 Auto Scaling User GuideA web tier runs on 4 EC2 instances behind an ALB, all in one AZ. The exam asks how to make it highly available.
The answer is to spread the instances across at least two AZs and place them in an Auto Scaling Group with the ALB performing health checks. Now if one AZ fails, the ALB stops sending traffic to the dead instances and the ASG launches replacements in the healthy AZ.
Note the distinction the exam loves: high availability (survive a failure with minimal downtime, e.g. Multi-AZ) versus fault tolerance (zero interruption, e.g. redundant capacity already running). Fault tolerance costs more because spare capacity runs even when idle.
How it connects
AWS Well-Architected FrameworkCompute resilience never stands alone. The ASG launches instances into subnets you define in your VPC, and those instances need security groups to accept traffic from the load balancer.
Stateless design matters: keep session state in ElastiCache or DynamoDB so any instance can serve any request, letting the ASG add or remove capacity freely. Health checks tie into observability — CloudWatch alarms can drive scaling policies. This is why compute questions often bleed into networking, databases, and monitoring.
- ELB health check vs EC2 status check — ASG can use either; ELB checks the app, EC2 checks the hypervisor.
- A single ASG spanning one AZ is NOT highly available, no matter how many instances it holds.
- NLB preserves the client source IP and gives static IPs; ALB does neither by default.
- Combine ASG + ELB + multi-AZ for automatic recovery.
- ALB = Layer 7 routing; NLB = Layer 4 low-latency + static IP.
- High availability tolerates brief downtime; fault tolerance tolerates none.