Custom Telegram Notifications for EC2 CPU Utilization

I am DevOps Engineer working on US based construction management company’s application. After I proposed the cost and work benefits of using AWS services to the product owner, most of our workloads are AWS-based servers and RDS database instances.

My team lead asked me I could setup a system to send custom alerts to telegram channel so that all of our dev team would get notified whenever EC2 instance exceeded a specific threshold. Of course, I can I said as I had hands on experience on this. Also, I wanted these notifications to be user friendly and provide detailed information like instance ID, current CPU percentage and time of alert.

Our Requirements:

  • Trigger an alert when CPU usage exceeds 80%.
  • Send the alert to a specified Telegram channel.
  • Customize the alert message to display clear, user-readable information.

To achieve these requirements, I utilized Amazon CloudWatch to monitor the EC2 instance’s CPU utilization, AWS Lambda for processing and sending custom alerts, and the Telegram Bot API for sending notifications to a Telegram group.

Step 1: Setting Up CloudWatch Alarm for CPU Utilization

The first step was to set up a CloudWatch Alarm that would trigger when the CPU utilization of the EC2 instance exceeded 80%.

Metric: I used the CPUUtilization.

Threshold: Set the threshold to 80%.

Alarm Actions: I set up an alarm action to trigger an AWS Lambda function whenever the CPU usage crossed the threshold.

Here’s the configuration for the CloudWatch alarm:

MetricConditionThreshold
CPUUtilizationGreater than 80% for 5 minutes80%

Step 2: Creating the AWS Lambda Function

The Lambda function was responsible for sending custom alerts to the Telegram channel. Here’s how I set it up:

Trigger: The Lambda function is triggered by the CloudWatch alarm.

Telegram API: I used the Telegram Bot API to send messages. I created a bot on Telegram and obtained the bot token and chat ID of the Telegram group where I wanted to send alerts.

Here’s the Python code I used for the Lambda function:

import json
import requests
import os

TELEGRAM_TOKEN = os.environ['TELEGRAM_TOKEN']
CHAT_ID = os.environ['CHAT_ID']
def lambda_handler(event, context):
     alarm_name = event['detail']['alarmName']
    instance_id = event['detail']['dimensions'][0]['value']
    state = event['detail']['state']['value']
    metric_value = event['detail']['state']['reasonData']['trigger']['statistic']
    
    message = f"Alert: EC2 Instance {instance_id} has high CPU usage!\n"
    message += f"Alarm: {alarm_name}\n"
    message += f"Current CPU Usage: {metric_value}%\n"
    message += f"Time: {event['time']}"
    
    url = f"https://api.telegram.org/bot{TELEGRAM_TOKEN}/sendMessage"
    data = {
        'chat_id': CHAT_ID,
        'text': message
    }
    
    response = requests.post(url, data=data)
    return {
        'statusCode': 200,
        'body': json.dumps('Notification sent!')
    }

Environment Variables: I stored the Telegram bot token and chat ID as environment variables to keep them secure.

ParameterValue
TELEGRAM_TOKENBot Token from Telegram
CHAT_IDID of the Telegram group

Step 3: Customizing the Alert Notification

To make the notifications more user-friendly, I customized the message sent by the Lambda function. The notification includes the following details:

  1. Instance ID: So users know which EC2 instance is facing high CPU usage.
  2. CPU Usage Percentage: The exact CPU percentage at the time of the alert.
  3. Time of Alert: The timestamp when the CPU exceeded the threshold.

Screenshot of the Telegram message while I was testing:

This message is easy to understand and helps our developer team take immediate action.

Step 4: Deploying the Lambda Function and Testing

  • Deploy Lambda: I deployed the Lambda function and linked it with the CloudWatch alarm.
  • Test: I tested the setup by generating high CPU usage on the EC2 instance and verifying that the alert was sent to the Telegram channel.

Step 5: Monitoring and Optimizing

After deployment, I set up AWS CloudWatch Logs to monitor the execution of the Lambda function. This allowed me to ensure that the notifications were being sent correctly and to debug any issues.

I also added more logic to the Lambda function to handle retries in case of network failures when sending messages to Telegram.

This solution provided real-time, user-readable alerts whenever the CPU utilization exceeded the defined threshold. By sending these alerts to a Telegram group, the team could respond quickly to high CPU usage events.

MetricBefore ImplementationAfter Implementation
Time to Alert15 minutes (manual)Immediate (real-time)
Notification DetailMinimalDetailed, Custom
CPU Usage MonitoringManualAutomated

By combining AWS CloudWatch, Lambda, and the Telegram Bot API, I was able to create a robust, real-time monitoring system for EC2 instances. The custom Telegram alerts allowed for faster response times to CPU-related issues, improving overall infrastructure stability.


Posted

in

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *