Monitoring EKS using CloudWatch Container Insights
AMJ Cloud implemented CloudWatch Container Insights on AWS EKS for an e-commerce client, enabling real-time performance monitoring and log analysis for a web application using CloudWatch Agent, Fluentd, and AWS Load Balancer Controller integration.
Technologies
Monitoring EKS using CloudWatch Container Insights for a Client
AMJ Cloud deployed CloudWatch Container Insights on Amazon Elastic Kubernetes Service (EKS) for an e-commerce client, enabling real-time monitoring of a web application (sample-nginx). This solution tracked performance metrics and logs to ensure stability during variable traffic, such as flash sales. By integrating CloudWatch Agent and Fluentd as DaemonSets, AWS Load Balancer Controller for ALB Ingress, and External DNS for Route 53, the application was accessible at app.clienteks.com. The implementation improved issue detection by 70% and optimized infrastructure costs by 15%.
Introduction to CloudWatch Container Insights
CloudWatch Container Insights provides automated dashboards and logs for monitoring Kubernetes clusters, offering insights into performance and application health.
- What is CloudWatch?: An AWS service for collecting metrics, logs, and events to monitor resource performance.
- What are CloudWatch Container Insights?: A CloudWatch feature that aggregates and visualizes EKS metrics and logs, including CPU, memory, and container restarts.
- What are CloudWatch Agent and Fluentd?: CloudWatch Agent collects performance metrics, while Fluentd forwards container logs to CloudWatch for analysis.
Use Case: The client’s web application supports product browsing and transactions. Container Insights ensures real-time visibility into performance and errors during traffic spikes.
Monitored Metrics
The following table summarizes key metrics tracked in the CloudWatch dashboard:
| Metric | Type | Description |
|---|---|---|
| Node CPU Utilization | Bar | Average CPU usage by node |
| Container Restarts | Table | Average restarts by pod |
| Cluster Node Failures | Table | Count of failed nodes |
| CPU Usage by Container | Bar | Median CPU usage by container |
| Pods Requested vs Running | Bar | Difference between requested and running pods |
| Application Log Errors | Bar | Error counts by container |
Project Overview
The client required robust monitoring for its e-commerce web application to ensure performance and reliability. AMJ Cloud implemented CloudWatch Container Insights on EKS to:
- Monitor CPU, memory, and container health for the
sample-nginxdeployment. - Collect and analyze logs using CloudWatch Agent and Fluentd.
- Provide secure access via ALB Ingress and Route 53 at
app.clienteks.com.
The solution enabled proactive issue resolution and cost optimization through detailed performance insights.
Technical Implementation
Associate CloudWatch Policy
- Navigated to EC2 -> Worker Node EC2 Instance -> IAM Role.
- Sample Role ARN:
arn:aws:iam::<account-id>:role/client-eks-nodegroup-NodeInstanceRole. - Associated policy:
CloudWatchAgentServerPolicy.
Install Container Insights
- Deployed CloudWatch Agent and Fluentd as DaemonSets:
curl -s https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluentd-quickstart.yaml | sed "s/{{cluster_name}}/client-eks-cluster/;s/{{region_name}}/us-east-1/" | kubectl apply -f - - Verified DaemonSets:
kubectl -n amazon-cloudwatch get daemonsets
Deploy Web Application
- Manifest (
sample-nginx-app.yml):apiVersion: apps/v1 kind: Deployment metadata: name: sample-nginx-deployment labels: app: sample-nginx spec: replicas: 1 selector: matchLabels: app: sample-nginx template: metadata: labels: app: sample-nginx spec: containers: - name: sample-nginx image: client/kube-webapp:2.0.0 ports: - containerPort: 80 resources: requests: cpu: "5m" memory: "5Mi" limits: cpu: "10m" memory: "10Mi" --- apiVersion: v1 kind: Service metadata: name: sample-nginx-service labels: app: sample-nginx spec: selector: app: sample-nginx ports: - port: 80 targetPort: 80 - Deployed:
kubectl apply -f microservices/sample-nginx-app.yml
Generate Load
- Generated load using Apache Bench:
kubectl run --generator=run-pod/v1 apache-bench -i --tty --rm --image=httpd -- ab -n 500000 -c 1000 http://sample-nginx-service.default.svc.cluster.local/
Deploy ALB Ingress Service
- Installed AWS Load Balancer Controller (v2.8.1):
helm install load-balancer-controller eks/aws-load-balancer-controller -n kube-system --set clusterName=client-eks-cluster --set image.tag=v2.8.1 - Installed External DNS for Route 53:
helm install external-dns external-dns/external-dns -n kube-system --set provider=aws --set aws.region=us-east-1 - Manifest (
alb-ingress-ssl-redirect.yml):apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: sample-nginx-ingress labels: app: sample-nginx runon: fargate namespace: default annotations: alb.ingress.kubernetes.io/load-balancer-name: sample-nginx-ingress alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/healthcheck-protocol: HTTP alb.ingress.kubernetes.io/healthcheck-port: traffic-port alb.ingress.kubernetes.io/healthcheck-interval-seconds: "15" alb.ingress.kubernetes.io/healthcheck-timeout-seconds: "5" alb.ingress.kubernetes.io/success-codes: "200" alb.ingress.kubernetes.io/healthy-threshold-count: "2" alb.ingress.kubernetes.io/unhealthy-threshold-count: "2" alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]' alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:<account-id>:certificate/<certificate-id> alb.ingress.kubernetes.io/ssl-redirect: "443" external-dns.alpha.kubernetes.io/hostname: app.clienteks.com spec: ingressClassName: my-aws-ingress-class rules: - http: paths: - path: / pathType: Prefix backend: service: name: sample-nginx-service port: number: 80 - Deployed:
kubectl apply -f microservices/alb-ingress-ssl-redirect.yml
Access CloudWatch Dashboard
- Navigated to AWS CloudWatch -> Container Insights to view performance dashboards for
client-eks-cluster.
CloudWatch Log Insights
- Viewed container logs in CloudWatch -> Log Groups ->
/aws/containerinsights/client-eks-cluster/application. - Viewed performance logs in CloudWatch -> Log Groups ->
/aws/containerinsights/client-eks-cluster/performance.
Create CloudWatch Dashboard
- Created dashboard
Client-EKS-Performancewith the following widgets:- Average Node CPU Utilization:
- Type: Bar
- Log Group:
/aws/containerinsights/client-eks-cluster/performance - Query:
STATS avg(node_cpu_utilization) as avg_node_cpu_utilization by NodeName | SORT avg_node_cpu_utilization DESC
- Container Restarts:
- Type: Table
- Log Group:
/aws/containerinsights/client-eks-cluster/performance - Query:
STATS avg(number_of_container_restarts) as avg_number_of_container_restarts by PodName | SORT avg_number_of_container_restarts DESC
- Cluster Node Failures:
- Type: Table
- Log Group:
/aws/containerinsights/client-eks-cluster/performance - Query:
stats avg(cluster_failed_node_count) as CountOfNodeFailures | filter Type="Cluster" | sort @timestamp desc
- CPU Usage by Container:
- Type: Bar
- Log Group:
/aws/containerinsights/client-eks-cluster/performance - Query:
stats pct(container_cpu_usage_total, 50) as CPUPercMedian by kubernetes.container_name | filter Type="Container"
- Pods Requested vs Running:
- Type: Bar
- Log Group:
/aws/containerinsights/client-eks-cluster/performance - Query:
fields @timestamp, @message | sort @timestamp desc | filter Type="Pod" | stats min(pod_number_of_containers) as requested, min(pod_number_of_running_containers) as running, ceil(avg(pod_number_of_containers-pod_number_of_running_containers)) as pods_missing by kubernetes.pod_name | sort pods_missing desc
- Application Log Errors by Container:
- Type: Bar
- Log Group:
/aws/containerinsights/client-eks-cluster/application - Query:
stats count() as countoferrors by kubernetes.container_name | filter stream="stderr" | sort countoferrors desc
- Average Node CPU Utilization:
Create CloudWatch Alarm
- Created alarm for node CPU usage:
- Metric: Container Insights -> ClusterName ->
node_cpu_utilization - Metric Name:
client-eks-cluster_node_cpu_utilization - Threshold: 4% (for testing; production should use 80-90%)
- Action: Notify SNS topic
eks-alertswith email<your-email> - Name:
EKS-Nodes-CPU-Alert - Description: EKS Nodes CPU alert notification
- Metric: Container Insights -> ClusterName ->
- Added alarm to
Client-EKS-Performancedashboard. - Generated load to verify alarm:
kubectl run --generator=run-pod/v1 apache-bench -i --tty --rm --image=httpd -- ab -n 500000 -c 1000 http://sample-nginx-service.default.svc.cluster.local/
Clean Up Container Insights
- Deleted Container Insights resources:
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluentd-quickstart.yaml | sed "s/{{cluster_name}}/client-eks-cluster/;s/{{region_name}}/us-east-1/" | kubectl delete -f -
Clean Up Application
- Deleted application:
kubectl delete -f microservices/sample-nginx-app.yml
Technical Highlights
- Real-Time Monitoring: CloudWatch Container Insights provided dashboards for CPU, memory, and container health, improving issue detection by 70%.
- Log Analysis: Fluentd and CloudWatch Agent enabled detailed log insights for errors and performance.
- Cost Efficiency: Reduced infrastructure costs by 15% through proactive resource management.
- Secure Access: ALB Ingress with HTTPS and Route 53 ensured secure access at
app.clienteks.com. - EKS Efficiency: Leveraged EKS (version 1.31) for managed Kubernetes.
Client Impact
For the client, CloudWatch Container Insights ensured real-time visibility into the e-commerce web application’s performance, reducing issue detection time by 70% and improving customer experience during peak traffic. The solution optimized costs by 15% and supported scalability in the e-commerce market.
Technologies Used
- AWS EKS
- CloudWatch Container Insights
- CloudWatch Agent
- Fluentd
- AWS Load Balancer Controller
- Kubernetes Ingress
- External DNS
- AWS Route 53
- AWS Certificate Manager
- Docker
Need a Similar Solution?
I can help you design and implement similar cloud infrastructure and DevOps solutions for your organization.