Monitoring

First PublishedFeb 16, 2026ByAtif Alam

AWS provides built-in monitoring through CloudWatch (metrics, logs, alarms, dashboards) and CloudTrail (API audit logging). Together they answer “what’s happening?” and “who did what?”

CloudWatch

CloudWatch collects and visualizes metrics and logs from AWS services and your applications.

Metrics

Every AWS service automatically sends metrics to CloudWatch. You can also push custom metrics from your applications.

Built-In Metrics (Examples)

Service	Metric	What It Measures
EC2	`CPUUtilization`	CPU usage (%)
EC2	`NetworkIn` / `NetworkOut`	Network traffic (bytes)
RDS	`DatabaseConnections`	Active DB connections
RDS	`FreeStorageSpace`	Available disk space
ALB	`RequestCount`	Total requests
ALB	`TargetResponseTime`	Latency (seconds)
Lambda	`Invocations`	Function calls
Lambda	`Duration`	Execution time (ms)
S3	`BucketSizeBytes`	Bucket storage size
SQS	`ApproximateNumberOfMessagesVisible`	Queue depth

Note: EC2 metrics are at 5-minute intervals by default. Enable detailed monitoring for 1-minute intervals (additional cost).

Custom Metrics

Push your own application metrics to CloudWatch:

1
# CLI
2
aws cloudwatch put-metric-data --namespace "MyApp" \
3
  --metric-name "OrdersProcessed" --value 42 --unit Count
4

5
# With dimensions (like Prometheus labels)
6
aws cloudwatch put-metric-data --namespace "MyApp" \
7
  --metric-name "OrdersProcessed" --value 42 --unit Count \
8
  --dimensions Environment=production,Service=orders-api

Python (boto3):

1
import boto3
2

3
cloudwatch = boto3.client('cloudwatch')
4

5
cloudwatch.put_metric_data(
6
    Namespace='MyApp',
7
    MetricData=[{
8
        'MetricName': 'OrdersProcessed',
9
        'Value': 42,
10
        'Unit': 'Count',
11
        'Dimensions': [
12
            {'Name': 'Environment', 'Value': 'production'},
13
            {'Name': 'Service', 'Value': 'orders-api'},
14
        ]
15
    }]
16
)

Key Concepts

Concept	What It Is
Namespace	A container for metrics (e.g. `AWS/EC2`, `AWS/RDS`, `MyApp`)
Dimension	A key-value pair that identifies a specific metric stream (e.g. `InstanceId=i-abc`)
Period	The time aggregation window (60s, 300s, etc.)
Statistic	How to aggregate: `Average`, `Sum`, `Minimum`, `Maximum`, `SampleCount`, percentiles (`p99`)

Alarms

Alarms watch a metric and trigger actions when a threshold is crossed.

1
# Create an alarm: notify when CPU > 80% for 5 minutes
2
aws cloudwatch put-metric-alarm \
3
  --alarm-name "HighCPU" \
4
  --metric-name CPUUtilization \
5
  --namespace AWS/EC2 \
6
  --statistic Average \
7
  --period 300 \
8
  --threshold 80 \
9
  --comparison-operator GreaterThanThreshold \
10
  --evaluation-periods 2 \
11
  --alarm-actions arn:aws:sns:us-east-1:123456789012:ops-alerts \
12
  --dimensions Name=InstanceId,Value=i-abc123

Alarm States

State	Meaning
OK	Metric is within the threshold
ALARM	Metric exceeded the threshold for the evaluation period
INSUFFICIENT_DATA	Not enough data to determine state

Alarm Actions

Action	What It Does
SNS notification	Send to email, Slack (via Lambda), PagerDuty
Auto Scaling	Scale up/down EC2 instances
EC2 action	Stop, terminate, reboot, or recover an instance
Lambda	Trigger a function for custom remediation

Composite Alarms

Combine multiple alarms with AND/OR logic:

1
# ALARM only when BOTH CPU is high AND memory is high
2
aws cloudwatch put-composite-alarm \
3
  --alarm-name "HighResourceUsage" \
4
  --alarm-rule 'ALARM("HighCPU") AND ALARM("HighMemory")'

Reduces alert noise — alert on the combination of symptoms, not each one individually.

CloudWatch Logs

CloudWatch Logs stores and queries log data from AWS services and your applications.

Log Structure

1
Log Group: /aws/lambda/my-function       ← container (one per app/service)
2
  └── Log Stream: 2026/02/16/[$LATEST]   ← one stream per source instance
3
        ├── log event: "START RequestId..."
4
        ├── log event: "Processing order 123..."
5
        └── log event: "END RequestId..."

Concept	What It Is
Log group	A collection of log streams. Set retention (1 day – indefinite).
Log stream	A sequence of log events from one source (one Lambda invocation, one EC2 instance).
Log event	A single log line with a timestamp.

Sending Logs to CloudWatch

Source	How
Lambda	Automatic (logs go to `/aws/lambda/<function-name>`)
ECS	Use `awslogs` log driver in task definition
EC2	Install the CloudWatch agent
On-premises	Install the CloudWatch agent
API Gateway	Enable access logging
RDS	Enable in database parameter group

CloudWatch Agent config (EC2):

1
{
2
  "logs": {
3
    "logs_collected": {
4
      "files": {
5
        "collect_list": [{
6
          "file_path": "/var/log/myapp/*.log",
7
          "log_group_name": "/myapp/production",
8
          "log_stream_name": "{instance_id}"
9
        }]
10
      }
11
    }
12
  }
13
}

Log Insights (Querying)

CloudWatch Logs Insights lets you query logs with a SQL-like language:

1
# Find errors in the last hour
2
fields @timestamp, @message
3
| filter @message like /ERROR/
4
| sort @timestamp desc
5
| limit 50
6

7
# Count errors per 5 minutes
8
fields @timestamp, @message
9
| filter @message like /ERROR/
10
| stats count(*) as error_count by bin(5m)
11

12
# Find slow Lambda invocations
13
fields @timestamp, @duration, @requestId
14
| filter @duration > 5000
15
| sort @duration desc
16
| limit 20
17

18
# Parse JSON logs and filter
19
fields @timestamp, @message
20
| parse @message '{"level":"*","msg":"*","duration":*}' as level, msg, duration
21
| filter level = "error"
22
| stats count(*) by msg

Metric Filters

Turn log patterns into CloudWatch metrics:

1
# Create a metric every time "ERROR" appears in logs
2
aws logs put-metric-filter \
3
  --log-group-name /myapp/production \
4
  --filter-name ErrorCount \
5
  --filter-pattern "ERROR" \
6
  --metric-transformations \
7
    metricName=ErrorCount,metricNamespace=MyApp,metricValue=1

Now you can create alarms on ErrorCount — alert when errors spike.

CloudWatch Dashboards

Create visual dashboards in the AWS console combining metrics, logs, and alarms:

1
aws cloudwatch put-dashboard --dashboard-name "Production" \
2
  --dashboard-body '{
3
    "widgets": [{
4
      "type": "metric",
5
      "properties": {
6
        "title": "CPU Usage",
7
        "metrics": [["AWS/EC2", "CPUUtilization", "InstanceId", "i-abc"]],
8
        "period": 300
9
      }
10
    }]
11
  }'

Tips:

Group metrics by service (one row per service).
Put alarms and key stats at the top.
Use automatic dashboards — CloudWatch auto-generates dashboards for many services.

CloudWatch vs Prometheus/Grafana

	CloudWatch	Prometheus + Grafana
Setup	Built-in (zero config for AWS services)	Self-hosted (more setup)
Custom metrics	API calls (can get expensive at high volume)	Pull-based (scrape targets)
Query language	Logs Insights (limited)	PromQL + LogQL (powerful)
Dashboards	Basic	Advanced (Grafana is far more flexible)
Cost	Pay per metric, alarm, log volume	Infrastructure cost only
Best for	AWS-native monitoring, small setups	Large-scale, multi-cloud, K8s-native

Many teams use both: CloudWatch for AWS-native metrics (billing, Lambda, RDS) and Prometheus/Grafana for application-level metrics and Kubernetes.

CloudTrail

CloudTrail logs every API call made in your AWS account — who did what, when, and from where.

What CloudTrail Records

Field	Example
Event time	`2026-02-16T10:30:00Z`
User	`arn:aws:iam::123456789012:user/alice`
Source IP	`203.0.113.50`
Action	`RunInstances`, `DeleteBucket`, `CreateUser`
Resource	`arn:aws:ec2:us-east-1:123456789012:instance/i-abc`
Result	Success or error code

Use Cases

Security audit — Who launched that instance? Who changed that security group?
Compliance — Prove that only authorized users accessed sensitive data.
Incident investigation — Trace the sequence of API calls during a breach.
Change tracking — What changed in the last 24 hours?

Setting Up CloudTrail

1
# Create a trail that logs all events to S3
2
aws cloudtrail create-trail \
3
  --name my-trail \
4
  --s3-bucket-name my-cloudtrail-logs \
5
  --is-multi-region-trail \
6
  --enable-log-file-validation
7

8
# Start logging
9
aws cloudtrail start-logging --name my-trail

Event Types

Type	What It Logs	Enabled by Default
Management events	Control-plane API calls (create, delete, modify resources)	Yes (free for last 90 days)
Data events	Data-plane operations (S3 `GetObject`, Lambda `Invoke`)	No (additional cost)
Insights events	Unusual activity patterns (API call spikes)	No (additional cost)

CloudTrail + CloudWatch Logs

Send CloudTrail events to CloudWatch Logs for real-time alerting:

1
aws cloudtrail update-trail --name my-trail \
2
  --cloud-watch-logs-log-group-arn arn:aws:logs:us-east-1:123456789012:log-group:CloudTrail \
3
  --cloud-watch-logs-role-arn arn:aws:iam::123456789012:role/CloudTrailCloudWatch

Then create metric filters to alert on suspicious activity:

1
# Alert on root account usage
2
aws logs put-metric-filter \
3
  --log-group-name CloudTrail \
4
  --filter-name RootAccountUsage \
5
  --filter-pattern '{ $.userIdentity.type = "Root" }' \
6
  --metric-transformations \
7
    metricName=RootAccountUsage,metricNamespace=Security,metricValue=1

Key Takeaways

CloudWatch Metrics monitor AWS resources automatically. Add custom metrics for application-level data.
CloudWatch Alarms trigger notifications or auto-scaling when thresholds are crossed. Use composite alarms to reduce noise.
CloudWatch Logs collect logs from Lambda, ECS, EC2, and more. Use Logs Insights for querying and metric filters for alerting.
CloudTrail logs every API call for security auditing and compliance. Enable multi-region trails and send events to CloudWatch Logs.
For Kubernetes and multi-cloud environments, consider supplementing CloudWatch with Prometheus + Grafana.