Skip to content

Monitoring

First PublishedByAtif Alam

AWS provides built-in monitoring through CloudWatch (metrics, logs, alarms, dashboards) and CloudTrail (API audit logging). Together they answer “what’s happening?” and “who did what?”

CloudWatch collects and visualizes metrics and logs from AWS services and your applications.

Every AWS service automatically sends metrics to CloudWatch. You can also push custom metrics from your applications.

ServiceMetricWhat It Measures
EC2CPUUtilizationCPU usage (%)
EC2NetworkIn / NetworkOutNetwork traffic (bytes)
RDSDatabaseConnectionsActive DB connections
RDSFreeStorageSpaceAvailable disk space
ALBRequestCountTotal requests
ALBTargetResponseTimeLatency (seconds)
LambdaInvocationsFunction calls
LambdaDurationExecution time (ms)
S3BucketSizeBytesBucket storage size
SQSApproximateNumberOfMessagesVisibleQueue depth

Note: EC2 metrics are at 5-minute intervals by default. Enable detailed monitoring for 1-minute intervals (additional cost).

Push your own application metrics to CloudWatch:

Terminal window
# CLI
aws cloudwatch put-metric-data --namespace "MyApp" \
--metric-name "OrdersProcessed" --value 42 --unit Count
# With dimensions (like Prometheus labels)
aws cloudwatch put-metric-data --namespace "MyApp" \
--metric-name "OrdersProcessed" --value 42 --unit Count \
--dimensions Environment=production,Service=orders-api

Python (boto3):

import boto3
cloudwatch = boto3.client('cloudwatch')
cloudwatch.put_metric_data(
Namespace='MyApp',
MetricData=[{
'MetricName': 'OrdersProcessed',
'Value': 42,
'Unit': 'Count',
'Dimensions': [
{'Name': 'Environment', 'Value': 'production'},
{'Name': 'Service', 'Value': 'orders-api'},
]
}]
)
ConceptWhat It Is
NamespaceA container for metrics (e.g. AWS/EC2, AWS/RDS, MyApp)
DimensionA key-value pair that identifies a specific metric stream (e.g. InstanceId=i-abc)
PeriodThe time aggregation window (60s, 300s, etc.)
StatisticHow to aggregate: Average, Sum, Minimum, Maximum, SampleCount, percentiles (p99)

Alarms watch a metric and trigger actions when a threshold is crossed.

Terminal window
# Create an alarm: notify when CPU > 80% for 5 minutes
aws cloudwatch put-metric-alarm \
--alarm-name "HighCPU" \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--statistic Average \
--period 300 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 2 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:ops-alerts \
--dimensions Name=InstanceId,Value=i-abc123
StateMeaning
OKMetric is within the threshold
ALARMMetric exceeded the threshold for the evaluation period
INSUFFICIENT_DATANot enough data to determine state
ActionWhat It Does
SNS notificationSend to email, Slack (via Lambda), PagerDuty
Auto ScalingScale up/down EC2 instances
EC2 actionStop, terminate, reboot, or recover an instance
LambdaTrigger a function for custom remediation

Combine multiple alarms with AND/OR logic:

Terminal window
# ALARM only when BOTH CPU is high AND memory is high
aws cloudwatch put-composite-alarm \
--alarm-name "HighResourceUsage" \
--alarm-rule 'ALARM("HighCPU") AND ALARM("HighMemory")'

Reduces alert noise — alert on the combination of symptoms, not each one individually.

CloudWatch Logs stores and queries log data from AWS services and your applications.

Log Group: /aws/lambda/my-function ← container (one per app/service)
└── Log Stream: 2026/02/16/[$LATEST] ← one stream per source instance
├── log event: "START RequestId..."
├── log event: "Processing order 123..."
└── log event: "END RequestId..."
ConceptWhat It Is
Log groupA collection of log streams. Set retention (1 day – indefinite).
Log streamA sequence of log events from one source (one Lambda invocation, one EC2 instance).
Log eventA single log line with a timestamp.
SourceHow
LambdaAutomatic (logs go to /aws/lambda/<function-name>)
ECSUse awslogs log driver in task definition
EC2Install the CloudWatch agent
On-premisesInstall the CloudWatch agent
API GatewayEnable access logging
RDSEnable in database parameter group

CloudWatch Agent config (EC2):

{
"logs": {
"logs_collected": {
"files": {
"collect_list": [{
"file_path": "/var/log/myapp/*.log",
"log_group_name": "/myapp/production",
"log_stream_name": "{instance_id}"
}]
}
}
}
}

CloudWatch Logs Insights lets you query logs with a SQL-like language:

# Find errors in the last hour
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 50
# Count errors per 5 minutes
fields @timestamp, @message
| filter @message like /ERROR/
| stats count(*) as error_count by bin(5m)
# Find slow Lambda invocations
fields @timestamp, @duration, @requestId
| filter @duration > 5000
| sort @duration desc
| limit 20
# Parse JSON logs and filter
fields @timestamp, @message
| parse @message '{"level":"*","msg":"*","duration":*}' as level, msg, duration
| filter level = "error"
| stats count(*) by msg

Turn log patterns into CloudWatch metrics:

Terminal window
# Create a metric every time "ERROR" appears in logs
aws logs put-metric-filter \
--log-group-name /myapp/production \
--filter-name ErrorCount \
--filter-pattern "ERROR" \
--metric-transformations \
metricName=ErrorCount,metricNamespace=MyApp,metricValue=1

Now you can create alarms on ErrorCount — alert when errors spike.

Create visual dashboards in the AWS console combining metrics, logs, and alarms:

Terminal window
aws cloudwatch put-dashboard --dashboard-name "Production" \
--dashboard-body '{
"widgets": [{
"type": "metric",
"properties": {
"title": "CPU Usage",
"metrics": [["AWS/EC2", "CPUUtilization", "InstanceId", "i-abc"]],
"period": 300
}
}]
}'

Tips:

  • Group metrics by service (one row per service).
  • Put alarms and key stats at the top.
  • Use automatic dashboards — CloudWatch auto-generates dashboards for many services.
CloudWatchPrometheus + Grafana
SetupBuilt-in (zero config for AWS services)Self-hosted (more setup)
Custom metricsAPI calls (can get expensive at high volume)Pull-based (scrape targets)
Query languageLogs Insights (limited)PromQL + LogQL (powerful)
DashboardsBasicAdvanced (Grafana is far more flexible)
CostPay per metric, alarm, log volumeInfrastructure cost only
Best forAWS-native monitoring, small setupsLarge-scale, multi-cloud, K8s-native

Many teams use both: CloudWatch for AWS-native metrics (billing, Lambda, RDS) and Prometheus/Grafana for application-level metrics and Kubernetes.

CloudTrail logs every API call made in your AWS account — who did what, when, and from where.

FieldExample
Event time2026-02-16T10:30:00Z
Userarn:aws:iam::123456789012:user/alice
Source IP203.0.113.50
ActionRunInstances, DeleteBucket, CreateUser
Resourcearn:aws:ec2:us-east-1:123456789012:instance/i-abc
ResultSuccess or error code
  • Security audit — Who launched that instance? Who changed that security group?
  • Compliance — Prove that only authorized users accessed sensitive data.
  • Incident investigation — Trace the sequence of API calls during a breach.
  • Change tracking — What changed in the last 24 hours?
Terminal window
# Create a trail that logs all events to S3
aws cloudtrail create-trail \
--name my-trail \
--s3-bucket-name my-cloudtrail-logs \
--is-multi-region-trail \
--enable-log-file-validation
# Start logging
aws cloudtrail start-logging --name my-trail
TypeWhat It LogsEnabled by Default
Management eventsControl-plane API calls (create, delete, modify resources)Yes (free for last 90 days)
Data eventsData-plane operations (S3 GetObject, Lambda Invoke)No (additional cost)
Insights eventsUnusual activity patterns (API call spikes)No (additional cost)

Send CloudTrail events to CloudWatch Logs for real-time alerting:

Terminal window
aws cloudtrail update-trail --name my-trail \
--cloud-watch-logs-log-group-arn arn:aws:logs:us-east-1:123456789012:log-group:CloudTrail \
--cloud-watch-logs-role-arn arn:aws:iam::123456789012:role/CloudTrailCloudWatch

Then create metric filters to alert on suspicious activity:

Terminal window
# Alert on root account usage
aws logs put-metric-filter \
--log-group-name CloudTrail \
--filter-name RootAccountUsage \
--filter-pattern '{ $.userIdentity.type = "Root" }' \
--metric-transformations \
metricName=RootAccountUsage,metricNamespace=Security,metricValue=1
  • CloudWatch Metrics monitor AWS resources automatically. Add custom metrics for application-level data.
  • CloudWatch Alarms trigger notifications or auto-scaling when thresholds are crossed. Use composite alarms to reduce noise.
  • CloudWatch Logs collect logs from Lambda, ECS, EC2, and more. Use Logs Insights for querying and metric filters for alerting.
  • CloudTrail logs every API call for security auditing and compliance. Enable multi-region trails and send events to CloudWatch Logs.
  • For Kubernetes and multi-cloud environments, consider supplementing CloudWatch with Prometheus + Grafana.