Well-Architected Framework

First PublishedFeb 16, 2026ByAtif Alam

The AWS Well-Architected Framework is a set of best practices for evaluating and improving cloud architectures. It’s organized into six pillars — use them as a checklist when designing or reviewing any AWS workload.

The Six Pillars

Pillar	Core Question
Operational Excellence	How do you run and monitor systems to deliver business value?
Security	How do you protect information and systems?
Reliability	How do you prevent failures, and recover quickly when they happen?
Performance Efficiency	How do you use resources efficiently as demand changes?
Cost Optimization	How do you avoid unnecessary costs?
Sustainability	How do you minimize environmental impact?

1. Operational Excellence

Run and monitor systems to deliver business value and continually improve processes.

Key Practices

Practice	What It Means
Infrastructure as Code	Define everything in Terraform/CloudFormation — no manual console changes
Small, frequent changes	Deploy often with small diffs. Easier to debug and roll back.
Automate runbooks	Turn manual operational tasks into scripts or Lambda functions
Observe everything	Metrics, logs, traces, alarms. You can’t fix what you can’t see.
Learn from failures	Post-incident reviews, game days, chaos engineering

AWS Services

Service	Role
CloudFormation / CDK / Terraform	Infrastructure as Code
CloudWatch	Metrics, logs, alarms, dashboards
X-Ray	Distributed tracing
Systems Manager	Runbooks, patching, parameter management
Config	Track resource configuration changes

Design Principle

Perform operations as code. Make frequent, small, reversible changes. Refine operations procedures frequently. Anticipate failure. Learn from all operational failures.

2. Security

Protect information, systems, and assets while delivering business value.

Key Practices

Practice	What It Means
Least privilege	Grant only the minimum permissions needed
Defense in depth	Multiple security layers (WAF, SG, NACL, IAM, encryption)
Automate security	Use Config rules, GuardDuty, Security Hub — don’t rely on manual checks
Encrypt everything	At rest (KMS) and in transit (TLS)
Protect data	Classify data, control access, enable logging
Enable traceability	CloudTrail + CloudWatch Logs for full audit trail

AWS Services

Service	Role
IAM	Identity and access management
KMS	Encryption key management
WAF, Shield	Application and DDoS protection
GuardDuty	Threat detection
Secrets Manager	Credential management
CloudTrail	API audit logging
Security Hub	Central security dashboard

Design Principle

Implement a strong identity foundation. Enable traceability. Apply security at all layers. Automate security best practices. Protect data in transit and at rest.

3. Reliability

Prevent failures, and recover quickly when they happen.

Key Practices

Practice	What It Means
Multi-AZ / multi-region	Distribute across failure domains
Auto-recovery	Auto Scaling, health checks, automated failover
Limit blast radius	Isolate components so one failure doesn’t cascade
Test recovery	Practice failover, backup restoration, disaster recovery
Manage change	Automate deployments, use canary/blue-green strategies

AWS Services

Service	Role
Auto Scaling	Automatically replace failed instances
RDS Multi-AZ	Database failover
Route 53 health checks	DNS-level failover
S3 (11 nines durability)	Durable storage
Elastic Load Balancing	Distribute traffic, health checks
Backup	Centralized backup management

The Reliability Stack

1
Route 53 (DNS failover)
2
    │
3
CloudFront (edge caching, origin failover)
4
    │
5
ALB (health checks, multi-AZ)
6
    │
7
Auto Scaling Group (self-healing, multi-AZ)
8
    │
9
RDS Multi-AZ (automatic DB failover)
10
    │
11
S3 (11 nines durability for backups)

Design Principle

Automatically recover from failure. Test recovery procedures. Scale horizontally. Stop guessing capacity. Manage change in automation.

4. Performance Efficiency

Use compute, storage, and networking resources efficiently as demand changes.

Key Practices

Practice	What It Means
Right-size resources	Match instance types to actual workload needs
Use managed services	Let AWS handle scaling (Lambda, DynamoDB, Fargate)
Go global	CloudFront, multi-region deployments for low latency
Experiment	Test different instance types, storage classes, architectures
Mechanical sympathy	Understand the technology to use it best (e.g. DynamoDB access patterns)

Service Selection Guide

Workload	Best Option
Unpredictable traffic	Lambda, DynamoDB on-demand, Fargate
Steady compute	EC2 Reserved / Savings Plans
Static content	CloudFront + S3
High-IOPS database	EBS io2, Aurora
In-memory access	ElastiCache Redis
Global users	CloudFront, Global Accelerator, multi-region

Design Principle

Democratize advanced technologies. Go global in minutes. Use serverless architectures. Experiment more often. Consider mechanical sympathy.

5. Cost Optimization

Avoid unnecessary costs and understand where money is going.

Key Practices

Practice	What It Means
Understand your costs	Cost Explorer, tags, budgets
Adopt consumption models	Pay for what you use (Lambda, DynamoDB on-demand, Fargate)
Measure efficiency	Cost per transaction, cost per user
Stop spending on undifferentiated heavy lifting	Use managed services instead of self-hosting
Analyze and attribute	Tag everything, track cost per team/project

See Cost Management for detailed strategies.

Quick Wins

Action	Typical Savings
Delete unused EBS volumes and snapshots	Immediate
Right-size EC2 instances (Compute Optimizer)	10–30%
Savings Plans for steady workloads	30–72%
Spot Instances for fault-tolerant work	Up to 90%
S3 lifecycle policies	40–80% on storage
VPC endpoints for S3/DynamoDB	Eliminate NAT Gateway data charges

Design Principle

Implement cloud financial management. Adopt a consumption model. Measure overall efficiency. Stop spending money on undifferentiated heavy lifting. Analyze and attribute expenditure.

6. Sustainability

Minimize the environmental impact of running cloud workloads.

Key Practices

Practice	What It Means
Understand your impact	AWS Customer Carbon Footprint Tool
Right-size	Don’t over-provision — idle resources waste energy
Use efficient resources	Graviton (ARM) instances consume less power
Maximize utilization	Consolidate workloads, use Spot and serverless
Reduce data movement	Cache at the edge, compress data, minimize cross-region transfers

Design Principle

Understand your impact. Establish sustainability goals. Maximize utilization. Adopt more efficient hardware and software. Reduce the downstream impact of cloud workloads.

The Well-Architected Review

AWS provides a Well-Architected Tool in the console to run a structured review of your workload:

Define the workload — name, description, environment, AWS accounts.
Answer questions — ~60 questions across the six pillars.
Review findings — Each answer produces a risk level (High, Medium, None).
Create improvement plan — Prioritized list of actions to address high-risk items.

1
# List workloads
2
aws wellarchitected list-workloads
3

4
# Create a workload for review
5
aws wellarchitected create-workload \
6
  --workload-name "My Production App" \
7
  --environment PRODUCTION \
8
  --lenses wellarchitected \
9
  --aws-regions us-east-1

When to Run a Review

Trigger	Why
New workload launch	Validate architecture before production
Major changes	New service, migration, scaling event
Annually	Catch drift, adopt new best practices
Post-incident	Identify systemic weaknesses

Applying the Pillars in Practice

A real-world architecture that applies all six pillars:

1
CloudFront (performance, sustainability)
2
    │
3
WAF (security)
4
    │
5
ALB + Auto Scaling (reliability, performance)
6
    │
7
ECS Fargate on Graviton (cost, sustainability, operational)
8
    │
9
Aurora Multi-AZ (reliability)
10
    │
11
ElastiCache (performance)
12
    │
13
S3 with lifecycle (cost)
14
    │
15
CloudWatch + X-Ray (operational)
16
    │
17
KMS + Secrets Manager (security)
18
    │
19
Cost Explorer + Tags + Budgets (cost)

Key Takeaways

The Well-Architected Framework is a checklist, not a prescription — apply the principles that matter most to your workload.
The six pillars are: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability.
Trade-offs exist — optimizing for cost may reduce reliability; optimizing for performance may increase cost. Make conscious decisions.
Run a Well-Architected Review before launch, after major changes, and annually.
Most principles boil down to: automate everything, encrypt everything, tag everything, use managed services, deploy across AZs, and monitor continuously.