Automated CPU Stress Testing and Scaling on AWS EC2: A Robust Solution

Idea: Implementing a comprehensive stress testing solution for CPU utilization on Amazon EC2 instances, this system ensures optimal performance and scalability of your cloud infrastructure. Through CloudWatch alarms, SNS notifications, and Lambda functions, this automated process actively monitors CPU utilization and responds dynamically to fluctuations beyond defined thresholds.

Setup:

  1. Create 3 EC2 Instances:
Launch Instance:
Click "Launch Instance".
Choose an AMI.
Select desired example kind (e.G., t2.Micro).
Configure instance details (variety of times, community settings).
Add garage (optional).
Configure safety group (permit SSH).
Review and Launch.

2. Create Amazon Simple Notification Service to be triggered by CloudWatch Alarm:

1.Create a Topic:
2.Click on "Topics" in the left-hand menu.
3.Click on the "Create topic" button.
4.5Enter a name and display name for your topic and click "Create topic".
5.Subscribe an Email Endpoint:
6.Select the newly created topic.
7.Click on "Create subscription".
8.Choose "Email" as the protocol.
9.Enter the email address you want to receive notifications and click "Create subscription".
10.Confirm the subscription by clicking the confirmation link sent to the specified email address.

3. Create Cloud watch Alarm for EC2 “CloudWatch”:

Click on add cloud watch button as shown button

In the CloudWatch dashboard, click on "Alarms" in the left-hand menu.
  Click on the "Create alarm" button.
  Choose the metric you want to monitor (e.g., CPU utilization).
  Define the conditions for the alarm (e.g., "CPU utilization > 40% for 5 minutes").
  Click "Next".

Set up Actions:
  Under "Select an SNS topic", choose an existing topic or create a new one.
  Click "Next".

Configure Alarm Details
  Enter a name and description for your alarm.
  Configure any additional settings as needed.
  Click "Next".

Preview and Create:
  Review the alarm configuration.
  Click "Create alarm" to create the CloudWatch alarm.

4. Lambda Setup

Create Function: StartStopEC2
  Runtime → Python
  Architecture → x86_64
  Add Trigger (SNS : Threshold Reached) and Destination (Optional: For now SQS)

Add Lambda Function in Code:

import boto3

region = 'ap-south-1'
instances = ['i-0146965ad89d57c3e', 'i-0d147c4f1a7b14edd']
ec2 = boto3.client('ec2', region_name=region)

def toggle_instance_state(instance_ids):
    response = ec2.describe_instances(InstanceIds=instance_ids)
    for reservation in response['Reservations']:
        for instance in reservation['Instances']:
            instance_id = instance['InstanceId']
            current_state = instance['State']['Name']
            if current_state == 'running':
                # ec2.stop_instances(InstanceIds=[instance_id])
                print('This Instances are already in start state:', instance_id)
            elif current_state == 'stopped':
                ec2.start_instances(InstanceIds=[instance_id])
                print('Started instance:', instance_id)
            else:
                print('Instance', instance_id, 'is in an unexpected state:', current_state)


def lambda_handler(event, context):
    toggle_instance_state(instances)

Deploy the code and test:

Now The Real Test:

Install stress-ng tool on EC2 “CloudWatch”:

sudo apt update
sudo apt install stress-ng
stress-ng --cpu $(nproc) --timeout 5m

This command will stress all available CPU cores for 5 minutes.

now observe alarm, as soon as CPU Utilization goes beyond 40%. It will Trigger SNS and SNS will further trigger Lambda: Checks if provided instances are available in a valid state or not.

Check your Email too!!

With this robust solution in place, you can effectively manage CPU stress testing, maintain optimal performance, and scale your EC2 infrastructure seamlessly in response to changing demands.

#AWS #EC2 #CloudWatch #SNS #Lambda #StressTesting #AutoScaling #DevOps #InfrastructureAsCode

Thanks Reader!

Contact: abhi.sri784@gmail.com for 1:1 conversation

Repo: github.com/thunderpycode