Well-Architected Framework
The AWS Well-Architected Framework is a set of best practices for evaluating and improving cloud architectures. It’s organized into six pillars — use them as a checklist when designing or reviewing any AWS workload.
The Six Pillars
Section titled “The Six Pillars”| Pillar | Core Question |
|---|---|
| Operational Excellence | How do you run and monitor systems to deliver business value? |
| Security | How do you protect information and systems? |
| Reliability | How do you prevent failures, and recover quickly when they happen? |
| Performance Efficiency | How do you use resources efficiently as demand changes? |
| Cost Optimization | How do you avoid unnecessary costs? |
| Sustainability | How do you minimize environmental impact? |
1. Operational Excellence
Section titled “1. Operational Excellence”Run and monitor systems to deliver business value and continually improve processes.
Key Practices
Section titled “Key Practices”| Practice | What It Means |
|---|---|
| Infrastructure as Code | Define everything in Terraform/CloudFormation — no manual console changes |
| Small, frequent changes | Deploy often with small diffs. Easier to debug and roll back. |
| Automate runbooks | Turn manual operational tasks into scripts or Lambda functions |
| Observe everything | Metrics, logs, traces, alarms. You can’t fix what you can’t see. |
| Learn from failures | Post-incident reviews, game days, chaos engineering |
AWS Services
Section titled “AWS Services”| Service | Role |
|---|---|
| CloudFormation / CDK / Terraform | Infrastructure as Code |
| CloudWatch | Metrics, logs, alarms, dashboards |
| X-Ray | Distributed tracing |
| Systems Manager | Runbooks, patching, parameter management |
| Config | Track resource configuration changes |
Design Principle
Section titled “Design Principle”Perform operations as code. Make frequent, small, reversible changes. Refine operations procedures frequently. Anticipate failure. Learn from all operational failures.
2. Security
Section titled “2. Security”Protect information, systems, and assets while delivering business value.
Key Practices
Section titled “Key Practices”| Practice | What It Means |
|---|---|
| Least privilege | Grant only the minimum permissions needed |
| Defense in depth | Multiple security layers (WAF, SG, NACL, IAM, encryption) |
| Automate security | Use Config rules, GuardDuty, Security Hub — don’t rely on manual checks |
| Encrypt everything | At rest (KMS) and in transit (TLS) |
| Protect data | Classify data, control access, enable logging |
| Enable traceability | CloudTrail + CloudWatch Logs for full audit trail |
AWS Services
Section titled “AWS Services”| Service | Role |
|---|---|
| IAM | Identity and access management |
| KMS | Encryption key management |
| WAF, Shield | Application and DDoS protection |
| GuardDuty | Threat detection |
| Secrets Manager | Credential management |
| CloudTrail | API audit logging |
| Security Hub | Central security dashboard |
Design Principle
Section titled “Design Principle”Implement a strong identity foundation. Enable traceability. Apply security at all layers. Automate security best practices. Protect data in transit and at rest.
3. Reliability
Section titled “3. Reliability”Prevent failures, and recover quickly when they happen.
Key Practices
Section titled “Key Practices”| Practice | What It Means |
|---|---|
| Multi-AZ / multi-region | Distribute across failure domains |
| Auto-recovery | Auto Scaling, health checks, automated failover |
| Limit blast radius | Isolate components so one failure doesn’t cascade |
| Test recovery | Practice failover, backup restoration, disaster recovery |
| Manage change | Automate deployments, use canary/blue-green strategies |
AWS Services
Section titled “AWS Services”| Service | Role |
|---|---|
| Auto Scaling | Automatically replace failed instances |
| RDS Multi-AZ | Database failover |
| Route 53 health checks | DNS-level failover |
| S3 (11 nines durability) | Durable storage |
| Elastic Load Balancing | Distribute traffic, health checks |
| Backup | Centralized backup management |
The Reliability Stack
Section titled “The Reliability Stack”Route 53 (DNS failover) │CloudFront (edge caching, origin failover) │ALB (health checks, multi-AZ) │Auto Scaling Group (self-healing, multi-AZ) │RDS Multi-AZ (automatic DB failover) │S3 (11 nines durability for backups)Design Principle
Section titled “Design Principle”Automatically recover from failure. Test recovery procedures. Scale horizontally. Stop guessing capacity. Manage change in automation.
4. Performance Efficiency
Section titled “4. Performance Efficiency”Use compute, storage, and networking resources efficiently as demand changes.
Key Practices
Section titled “Key Practices”| Practice | What It Means |
|---|---|
| Right-size resources | Match instance types to actual workload needs |
| Use managed services | Let AWS handle scaling (Lambda, DynamoDB, Fargate) |
| Go global | CloudFront, multi-region deployments for low latency |
| Experiment | Test different instance types, storage classes, architectures |
| Mechanical sympathy | Understand the technology to use it best (e.g. DynamoDB access patterns) |
Service Selection Guide
Section titled “Service Selection Guide”| Workload | Best Option |
|---|---|
| Unpredictable traffic | Lambda, DynamoDB on-demand, Fargate |
| Steady compute | EC2 Reserved / Savings Plans |
| Static content | CloudFront + S3 |
| High-IOPS database | EBS io2, Aurora |
| In-memory access | ElastiCache Redis |
| Global users | CloudFront, Global Accelerator, multi-region |
Design Principle
Section titled “Design Principle”Democratize advanced technologies. Go global in minutes. Use serverless architectures. Experiment more often. Consider mechanical sympathy.
5. Cost Optimization
Section titled “5. Cost Optimization”Avoid unnecessary costs and understand where money is going.
Key Practices
Section titled “Key Practices”| Practice | What It Means |
|---|---|
| Understand your costs | Cost Explorer, tags, budgets |
| Adopt consumption models | Pay for what you use (Lambda, DynamoDB on-demand, Fargate) |
| Measure efficiency | Cost per transaction, cost per user |
| Stop spending on undifferentiated heavy lifting | Use managed services instead of self-hosting |
| Analyze and attribute | Tag everything, track cost per team/project |
See Cost Management for detailed strategies.
Quick Wins
Section titled “Quick Wins”| Action | Typical Savings |
|---|---|
| Delete unused EBS volumes and snapshots | Immediate |
| Right-size EC2 instances (Compute Optimizer) | 10–30% |
| Savings Plans for steady workloads | 30–72% |
| Spot Instances for fault-tolerant work | Up to 90% |
| S3 lifecycle policies | 40–80% on storage |
| VPC endpoints for S3/DynamoDB | Eliminate NAT Gateway data charges |
Design Principle
Section titled “Design Principle”Implement cloud financial management. Adopt a consumption model. Measure overall efficiency. Stop spending money on undifferentiated heavy lifting. Analyze and attribute expenditure.
6. Sustainability
Section titled “6. Sustainability”Minimize the environmental impact of running cloud workloads.
Key Practices
Section titled “Key Practices”| Practice | What It Means |
|---|---|
| Understand your impact | AWS Customer Carbon Footprint Tool |
| Right-size | Don’t over-provision — idle resources waste energy |
| Use efficient resources | Graviton (ARM) instances consume less power |
| Maximize utilization | Consolidate workloads, use Spot and serverless |
| Reduce data movement | Cache at the edge, compress data, minimize cross-region transfers |
Design Principle
Section titled “Design Principle”Understand your impact. Establish sustainability goals. Maximize utilization. Adopt more efficient hardware and software. Reduce the downstream impact of cloud workloads.
The Well-Architected Review
Section titled “The Well-Architected Review”AWS provides a Well-Architected Tool in the console to run a structured review of your workload:
- Define the workload — name, description, environment, AWS accounts.
- Answer questions — ~60 questions across the six pillars.
- Review findings — Each answer produces a risk level (High, Medium, None).
- Create improvement plan — Prioritized list of actions to address high-risk items.
# List workloadsaws wellarchitected list-workloads
# Create a workload for reviewaws wellarchitected create-workload \ --workload-name "My Production App" \ --environment PRODUCTION \ --lenses wellarchitected \ --aws-regions us-east-1When to Run a Review
Section titled “When to Run a Review”| Trigger | Why |
|---|---|
| New workload launch | Validate architecture before production |
| Major changes | New service, migration, scaling event |
| Annually | Catch drift, adopt new best practices |
| Post-incident | Identify systemic weaknesses |
Applying the Pillars in Practice
Section titled “Applying the Pillars in Practice”A real-world architecture that applies all six pillars:
CloudFront (performance, sustainability) │WAF (security) │ALB + Auto Scaling (reliability, performance) │ECS Fargate on Graviton (cost, sustainability, operational) │Aurora Multi-AZ (reliability) │ElastiCache (performance) │S3 with lifecycle (cost) │CloudWatch + X-Ray (operational) │KMS + Secrets Manager (security) │Cost Explorer + Tags + Budgets (cost)Key Takeaways
Section titled “Key Takeaways”- The Well-Architected Framework is a checklist, not a prescription — apply the principles that matter most to your workload.
- The six pillars are: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability.
- Trade-offs exist — optimizing for cost may reduce reliability; optimizing for performance may increase cost. Make conscious decisions.
- Run a Well-Architected Review before launch, after major changes, and annually.
- Most principles boil down to: automate everything, encrypt everything, tag everything, use managed services, deploy across AZs, and monitor continuously.