Jomari Abejo
Jomari Abejo

Full Stack Developer

Why CloudWatch is So Helpful: Monitoring and Logging Made Easy

Amazon CloudWatch is AWS's monitoring and observability service. It collects metrics, logs, and events from your AWS resources and applications, giving you visibility into what's happening in your infrastructure.


What is CloudWatch?

CloudWatch is a comprehensive monitoring solution that provides:

  • Metrics: Numerical data points (CPU usage, request count, errors)
  • Logs: Application and system logs from your services
  • Alarms: Automated notifications when thresholds are exceeded
  • Dashboards: Visual representations of your metrics
  • Insights: Analytics and queries for log data

Think of CloudWatch as your application's health monitor and log aggregator.


Why CloudWatch is Essential

1. Visibility Without Effort

CloudWatch automatically collects metrics from AWS services:

  • EC2 instances (CPU, memory, network)
  • RDS databases (connections, storage)
  • Lambda functions (invocations, errors, duration)
  • Application Load Balancers (request count, response time)
  • And many more...

You don't need to install anything—it's already collecting data!

2. Centralized Logging

Instead of SSHing into servers to check logs, CloudWatch Logs aggregates logs from:

  • EC2 instances
  • Lambda functions
  • Container services (ECS, EKS)
  • Your applications

All logs in one place, searchable and filterable.

3. Proactive Problem Detection

Set up alarms to notify you before problems become critical:

  • CPU usage above 80%
  • Error rate increasing
  • Disk space running low
  • API response time too high

Get alerts via email, SMS, or SNS topics.

4. Historical Data and Trends

CloudWatch stores metrics for up to 15 months, allowing you to:

  • Identify patterns and trends
  • Plan capacity based on historical data
  • Debug issues by comparing current vs. past behavior

5. Cost Monitoring

Track AWS costs and usage:

  • See spending trends
  • Identify expensive resources
  • Set budget alarms

CloudWatch Core Concepts

Metrics

Metrics are time-ordered data points:

  • Namespace: Container for metrics (e.g., "AWS/EC2")
  • Metric Name: Name of the metric (e.g., "CPUUtilization")
  • Dimensions: Name-value pairs that identify unique metric streams
  • Timestamp: When the data point was collected
  • Value: The actual measurement

Example: AWS/EC2 CPUUtilization for instance i-1234567890abcdef0

Log Groups and Log Streams

  • Log Group: Container for log streams (e.g., "/aws/ec2/myapp")
  • Log Stream: Sequence of log events from a single source (e.g., specific EC2 instance)

Alarms

Alarms monitor metrics and trigger actions:

  • Threshold: Value that triggers the alarm
  • Period: Evaluation period (e.g., 5 minutes)
  • Actions: What to do when alarm state changes (SNS, Auto Scaling, etc.)

Dashboards

Dashboards are collections of widgets showing metrics:

  • Line graphs
  • Number widgets
  • Text widgets
  • Custom widgets

CloudWatch Metrics in Action

Automatic EC2 Metrics

Every EC2 instance automatically sends these metrics:

  • CPUUtilization: Percentage of CPU used
  • NetworkIn/NetworkOut: Bytes transferred
  • DiskReadOps/DiskWriteOps: Disk I/O operations
  • StatusCheckFailed: Health check failures

View metrics:

  1. Go to CloudWatch Console
  2. Click "Metrics" → "All metrics"
  3. Select "EC2" → "Per-Instance Metrics"
  4. Select your instance and metric

Custom Metrics

You can send custom metrics from your application:

Using AWS SDK (Java):

import software.amazon.awssdk.services.cloudwatch.CloudWatchClient;
import software.amazon.awssdk.services.cloudwatch.model.*;

CloudWatchClient cloudWatch = CloudWatchClient.builder()
    .region(Region.US_EAST_1)
    .build();

MetricDatum metricDatum = MetricDatum.builder()
    .metricName("ActiveUsers")
    .value(150.0)
    .timestamp(Instant.now())
    .unit(StandardUnit.COUNT)
    .build();

PutMetricDataRequest request = PutMetricDataRequest.builder()
    .namespace("MyApplication")
    .metricData(metricDatum)
    .build();

cloudWatch.putMetricData(request);

CloudWatch Logs

Sending Logs from Spring Boot

Add CloudWatch Logs dependency:

<dependency>
    <groupId>ca.pjer</groupId>
    <artifactId>logback-awslogs-appender</artifactId>
    <version>1.6.0</version>
</dependency>

Configure logback-spring.xml:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <appender name="CLOUDWATCH" class="ca.pjer.logback.AwsLogsAppender">
        <logGroupName>my-spring-boot-app</logGroupName>
        <logStreamName>application-${HOSTNAME}</logStreamName>
        <region>us-east-1</region>
        <maxBatchLogEvents>50</maxBatchLogEvents>
        <maxFlushTimeMillis>30000</maxFlushTimeMillis>
        <layout>
            <pattern>%d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{36} - %msg%n</pattern>
        </layout>
    </appender>

    <root level="INFO">
        <appender-ref ref="CLOUDWATCH" />
        <appender-ref ref="CONSOLE" />
    </root>
</configuration>

Viewing Logs

  1. Go to CloudWatch Console
  2. Click "Logs" → "Log groups"
  3. Select your log group
  4. Click on a log stream
  5. View and search log events

Search logs:

  • Filter by time range
  • Search by text (e.g., "ERROR", "Exception")
  • Use filter patterns: [timestamp, level, message]

Creating CloudWatch Alarms

Example: High CPU Usage Alarm

Scenario: Alert when EC2 instance CPU exceeds 80%

Steps:

  1. Go to CloudWatch Console
  2. Click "Alarms" → "All alarms"
  3. Click "Create alarm"
  4. Click "Select metric"
  5. Choose "EC2" → "Per-Instance Metrics"
  6. Select "CPUUtilization" metric
  7. Select your instance
  8. Click "Select metric"

Configure alarm:

  • Metric: CPUUtilization
  • Statistic: Average
  • Period: 5 minutes
  • Threshold type: Static
  • Threshold: Greater than 80
  • Datapoints to alarm: 2 out of 2

Configure actions:

  • Notification: Create SNS topic or select existing
  • Email: Enter your email address
  • Alarm state trigger: In alarm
  1. Name alarm: "High-CPU-Alarm"
  2. Click "Create alarm"

Result: You'll receive an email when CPU exceeds 80% for 10+ minutes.

Example: Error Rate Alarm

Monitor application errors:

Using CloudWatch Metric Math:

m1 = Sum of HTTP 5xx errors (per 5 minutes)
m2 = Total requests (per 5 minutes)
(m1 / m2) * 100 > 5

Create alarm when error rate exceeds 5%.


CloudWatch Dashboards

Create a Dashboard

  1. Go to CloudWatch Console
  2. Click "Dashboards" → "All dashboards"
  3. Click "Create dashboard"
  4. Name it: "My Application Dashboard"
  5. Click "Create dashboard"

Add Widgets

Example: EC2 CPU Widget

  1. Click "Add widget"
  2. Select "Line" graph
  3. Select "EC2" → "Per-Instance Metrics" → "CPUUtilization"
  4. Select your instance(s)
  5. Configure:
    • Period: 5 minutes
    • Statistic: Average
  6. Click "Create widget"

Example: Application Error Count

  1. Add "Number" widget
  2. Select custom metric: "MyApplication/Errors"
  3. Statistic: Sum
  4. Period: 1 hour
  5. Create widget

Example: Log Insights Query

  1. Add "Logs table" widget
  2. Select log group
  3. Enter query:
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 20

Dashboard Best Practices

  • Group related metrics: Keep EC2, database, and application metrics together
  • Use appropriate time ranges: 1 hour, 6 hours, 24 hours, 1 week
  • Set meaningful titles: Make widgets self-explanatory
  • Refresh automatically: Set auto-refresh for real-time monitoring

CloudWatch Logs Insights

Log Insights lets you query and analyze log data using a SQL-like syntax.

Basic Queries

Get recent log entries:

fields @timestamp, @message
| sort @timestamp desc
| limit 100

Filter errors:

fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc

Count errors by level:

fields @message
| parse @message "[*] *" as level, message
| stats count() by level

Find slow API requests:

fields @timestamp, @message
| parse @message "GET * took *ms" as endpoint, duration
| filter duration > 1000
| sort duration desc

Advanced Queries

Error rate over time:

fields @timestamp, @message
| filter @message like /ERROR/
| stats count() as errorCount by bin(5m)

Top endpoints by request count:

fields @message
| parse @message "GET * " as endpoint
| stats count() as requests by endpoint
| sort requests desc
| limit 10

Monitoring Spring Boot Applications

Application Metrics

Send custom metrics from Spring Boot:

import io.micrometer.cloudwatch2.CloudWatchConfig;
import io.micrometer.cloudwatch2.CloudWatchMeterRegistry;
import io.micrometer.core.instrument.MeterRegistry;

@Configuration
public class CloudWatchMetricsConfig {

    @Bean
    public CloudWatchMeterRegistry cloudWatchMeterRegistry() {
        CloudWatchConfig config = new CloudWatchConfig() {
            @Override
            public String get(String key) {
                return null;
            }

            @Override
            public String namespace() {
                return "MySpringBootApp";
            }
        };

        return new CloudWatchMeterRegistry(
            config,
            Clock.SYSTEM,
            CloudWatchAsyncClient.create()
        );
    }
}

Track custom metrics:

@Service
public class OrderService {

    private final Counter orderCounter;
    private final Timer orderProcessingTime;

    public OrderService(MeterRegistry meterRegistry) {
        this.orderCounter = Counter.builder("orders.created")
            .description("Total orders created")
            .register(meterRegistry);

        this.orderProcessingTime = Timer.builder("orders.processing.time")
            .description("Order processing time")
            .register(meterRegistry);
    }

    public void createOrder(Order order) {
        Timer.Sample sample = Timer.start();
        try {
            // Process order
            orderCounter.increment();
        } finally {
            sample.stop(orderProcessingTime);
        }
    }
}

Health Checks

Expose Spring Boot Actuator metrics:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
management.endpoints.web.exposure.include=health,metrics,prometheus
management.metrics.export.cloudwatch.namespace=MySpringBootApp

Cost Considerations

CloudWatch pricing:

  • Custom metrics: $0.30 per metric per month (first 10,000 free)
  • API requests: $0.01 per 1,000 requests (first 1 million free)
  • Log ingestion: $0.50 per GB (first 5 GB free)
  • Log storage: $0.03 per GB per month
  • Dashboards: $3 per dashboard per month (first 3 free)
  • Alarms: $0.10 per alarm per month (first 10 free)

Cost Optimization Tips:

  1. Use metric math to combine metrics instead of creating multiple custom metrics
  2. Set log retention periods (logs older than retention are deleted)
  3. Use sampling for high-volume logs
  4. Archive old logs to S3 (cheaper storage)
  5. Limit dashboard widgets (each widget costs per API call)

Common Use Cases

Use Case 1: Monitor Application Performance

Metrics to track:

  • Request count
  • Response time (p50, p95, p99)
  • Error rate
  • Active users

Set alarms for:

  • Response time > 1 second
  • Error rate > 1%
  • Request count drops significantly

Use Case 2: Capacity Planning

Track:

  • CPU utilization trends
  • Memory usage over time
  • Request volume patterns

Use data to:

  • Plan instance sizing
  • Schedule scaling events
  • Predict future capacity needs

Use Case 3: Troubleshooting

When an issue occurs:

  1. Check alarms for any triggered alerts
  2. View recent logs in Log Insights
  3. Compare current metrics to historical data
  4. Query logs for specific error patterns
  5. Trace request flow through logs

Use Case 4: Compliance and Auditing

Track:

  • All API calls (via CloudTrail integration)
  • Access patterns
  • Error events
  • Security-related events

Generate reports:

  • Error summaries
  • Access logs
  • Performance reports

Best Practices

1. Set Up Alarms Early

Don't wait until production. Set up basic alarms during development.

2. Use Meaningful Names

Name metrics, logs, and alarms descriptively:

  • ✅ Good: api-response-time-p95
  • ❌ Bad: metric1

3. Monitor What Matters

Focus on business-critical metrics:

  • User-facing errors
  • Performance bottlenecks
  • Cost drivers
  • Security events

4. Set Appropriate Thresholds

Alarms should alert on real problems, not noise:

  • Too sensitive: Alert on every spike
  • Too loose: Alert only on critical failures
  • Right: Alert on sustained issues

5. Review and Refine

Regularly review:

  • Alarm effectiveness (false positives/negatives)
  • Unused metrics and logs
  • Dashboard relevance
  • Cost optimization opportunities

6. Use Log Retention

Set retention policies:

  • Development: 7 days
  • Staging: 30 days
  • Production: 90 days or longer (based on compliance needs)

7. Centralize Logs

Send all application logs to CloudWatch:

  • EC2 application logs
  • Container logs (ECS/EKS)
  • Lambda function logs
  • API Gateway logs

CloudWatch vs. Alternatives

CloudWatch Advantages

  • Native AWS Integration: Works seamlessly with AWS services
  • No Infrastructure: Fully managed, no servers to run
  • Comprehensive: Metrics, logs, alarms, dashboards in one place
  • Cost-Effective: Generous free tier

When to Consider Alternatives

  • Third-party tools: If you need advanced analytics (Datadog, New Relic)
  • Open source: If you want more control (Prometheus + Grafana)
  • Multi-cloud: If running across AWS, Azure, GCP

For AWS-native applications, CloudWatch is usually the best choice.


Getting Started Checklist

  • [ ] Enable CloudWatch for your EC2 instances
  • [ ] Set up basic alarms (CPU, memory, errors)
  • [ ] Configure application logging to CloudWatch
  • [ ] Create a dashboard with key metrics
  • [ ] Set up SNS topic for alarm notifications
  • [ ] Review CloudWatch pricing and optimize
  • [ ] Set log retention policies
  • [ ] Document your monitoring strategy

CloudWatch is your window into your AWS infrastructure and applications. Start with basic metrics and alarms, then expand as you need deeper insights. The visibility it provides is invaluable for maintaining reliable, performant applications.

Next Steps:

  • Set up CloudWatch for your EC2 instances
  • Configure application logging
  • Create your first dashboard
  • Set up critical alarms
  • Explore Log Insights for advanced log analysis