Recurring Schedule Auto Scaling with EC2

In my previous post, I described how to get started with auto scaling on EC2 using Cloud Watch metrics to automatically scale.

Auto scaling on a recurring schedule is another method of scaling by setting a minimum and maximum of instances at a specified time. For example you can a set a maximum number of instances to run during business hours then reduce it for outside of business hours when there is less web traffic.

Another useful use of recurring schedule auto scaling use, is for processing one time jobs. You launch instances to run a script at a specified time, then when the job has completed, the instance terminates. This is the case I’m going to cover today.

Prerequisites

Install and configure the awscli as explained in this post.

We’ll be creating our auto scaling group within EC2 classic.

Since we’ll be using an user data script, there is no need to build your own AMI as you can install and configure packages in the script.

IAM Role

As the instance needs to be able to terminate itself, create an IAM instance profile to allow the instance to authenticate and use the CLI and execute the terminate instance command:

Note the iam ARN which is returned. We’ll need it later.

Next, put in a file (/tmp/role.json) the following json statement to allow access to the EC2 service:

Then attached it to the role creation:

Finally add the role to the instance profile:

User data script

The user data script is passed on to the EC2 instance and executes when it boots the first time.

This is where you can install packages, execute your job then the instance terminates itself. The awscli package must be installed.

ec2metadata is Ubuntu’s package which collects EC2 information on the running instance.

Launch Configuration

Create the launch configuration specifying the user data script previously created plus the AMI, ARN, ssh key, security group and instance type:

Auto Scaling Group

Create the auto scaling group, note that there must be zero instances min and max as we’re not launching any instances immediately:

By default the auto scaling group will replace any unhealthy (terminated) instances, disable it:

Recurring Schedule

Now specify the schedule when to launch the instance(s) using the Unix contab recurrence format (it must be in UTC timezone).
This is the scheduled group action to launch one instance everyday at 1pm UTC time:

Then at 1pm UTC time, the instance will launch, run the user data script and finally self terminate !

Check the scaling activities with:

Even though the instance will self terminate when the job has finished, the auto scaling group will still show min-size and max-size 1, as a consequence it will not launch an instance at the next recurrence. So we must to reset the sizes to 0 whenever the job is likely to have finished by.

This scheduled action makes sure all instances are terminated and the auto scaling group reset, every day at midnight UTC time:

Recurring Schedule + Cloud Watch metrics Auto Scaling

All the recurring schedule does is adjust the min and max size of the instances in the auto scaling group. You can combine scheduled actions with traditional auto scaling using Cloud Watch metrics !

Cleanup

Delete the auto scaling group and launch configuration:

Notes

The instance self terminate is optional, you could skip the IAM part and just issue a shutdown command in the user data script and wait for the cleanup recurring schedule to terminate the instance.

The recent addition of the Auto Scaling Management to the Console does not support defining recurring schedule auto scaling.

Auto Scaling with Amazon EC2

Autoscaling on AWSAuto scaling is the Amazon Web Service which can automatically run additional (or terminate) EC2 instances depending on, for example, the amount of web traffic.

A typical scenario in a web environment would be: if you have a minimum of 2 web servers up and running 24h a day across two availability zones (for high availability) and you get an unexpected increase in traffic when you launch a new product or service. The web servers may struggle to keep up with the increase in traffic and start to slow down.

The solution is to provision additional servers (EC2 instances) and distribute the incoming web requests across the group of web servers.

Later,  say at night,  when the traffic decreases, some EC2 instances can be removed as they would no longer be needed and you’d be back to running the website on the minimum of 2 servers again.

Auto scaling also helps to lower costs of running servers as you only pay for what you use, per hour.

Prerequisites

This guide describes how to achieve basic auto scaling. In this example, we’ll be configuring auto scaling within a Virtual Private Cloud (VPC), and each of the two availability zones (here ap-southeast-2a and ap-southeast-2b) are configured with a subnet which can be reached from the internet (in the public VLAN). We’re assuming the VPC connected with an internet gateway and the subnets, have already been created

We’ll be using the new AWS command line interface, to install it:

Then we need to configure the CLI with the AWS credentials and default region:

Run complete to populate the available commands when you press tab:

The AWS CLI reference guide is accessible at http://docs.aws.amazon.com/cli/latest/reference/

We also need the Elastic Load Balancer API, which isn’t yet covered by the CLI:

Export the Java and ELB home directories plus your credentials and default ELB region URL (or place them in your home directory .bachrc file):

To achieve auto scaling, we’ll be completing in the following order:

  1. Creating an Amazon Machine Image (AMI)
  2. Creating an Elastic Load Balancer (ELB)
  3. Creating a Simple Notification Service (SNS) topic
  4. Creating Auto Scaling configurations and policies
  5. Creating CloudWatch metric alarms

Amazon Machine Image Creation

We need to build our own custom AMI which is configured with the web server (apache2, nginx etc…) and contains the website code.

Create the image when the instance is running or stopped, provide a name and description:

An AMI identifier is returned which we’ll need later.

Elastic Load Balancer Creation

Create the load balancer which will forward http traffic to the instances on the 2 public subnets. Specify a security group for the ELB which will allow http protocol traffic on port 80 for both ingress and egress:

The DNS_NAME is returned which is the A record endpoint of the website (you can then add an Alias to the A record in Route53 DNS for www.yourdomain.com).

Note the name of the load balancer you created which we’ll need later.

Simple Notification Service

It’s good to get notifications by email whenever an auto scaling event has been triggered, this is achievable by creating an SNS topic:

It returns an Amazon Resource Name (ARN) which we need to subscribe to next with an email address:

Check the inbox and confirm the subscription.

Note the ARN which we’ll need later as well.

Auto Scaling Creation

There are several steps for creating and configuring the auto scaling.

Launch Configuration

First we need to configure a launch configuration where you specify the AMI (created previously), the key pair name, security group(s) (which allows incoming traffic on port 80) and finally the instance type:

Auto scaling group

Next we create the auto scaling group where we specify how many EC2 instances we want running at least at any time, the maximum of EC2 instances to run, how many we wish to start with, the load balancer name (created previously), the two availability zones, the two subnets, some ELB settings and finally a tag for the instances. Here we’ll be starting with 2 instances minimum, which will also be the desired capacity and we’ll be allowing a maximum of 8 instances to be launched when there’s a lot of load on the servers:

The health check type option specified that the ELB will be determining whether an instance is healthy/online using a 60 second wait period after the instance has been launched.

As soon as the auto scaling group has been created, the desired capacity number of instances are immediately launched into the two availability zones/subnets. You can check what auto scaling actions have been executed by running:

Auto scaling notifications

We need to tell the auto scaling group to which ARN a notification must be sent whenever a scale up/down event has happened, using the ARN previously created:

Auto scaling policies

We have 2 instances running in the group, set by the desired capacity option. We need to create two policies which will be executed when we want to scale up (scaling adjustment 1) and down (-1):

Note the two ARNs which we’ll need in the next part.

The cooldown setting instructs the auto scaling group not to perform any scaling operations for 300 seconds after one is triggered. This is to prevent many scaling activities to be executed within a short timeframe.

CloudWatch Alarms Creation

The final part is to create some alarm events which will trigger the scale up and down auto scaling policies. Cloud Watch provides several metrics such as CPU utilisation, disks utilisation, network in/out etc…

Here we’ll be using the CPU utilisation metric which is a commonly used for auto scaling; a high percentage of utilisation obviously means the instance is overloaded and needs to have load taken off.

Using the policy ARN created earlier for scaling adjustment 1, create the metric alarm which will fire when the average CPU utilisation is greater than 80% twice over a period of 5 minutes:

Then create the metric for scaling adjustment -1 which will fire when the CPU utilisation is less than 80% :

Note: the Cloud Watch metrics used for auto scaling are global averages of all instances in the auto scaling group, they are not instance specific metrics (which can be viewed separately).

Testing the auto scaling

Now that we have configured auto scaling, generate some traffic on the website, using the load balancer A record (or alias) and watch the magic happen !

You will be notified by email when auto scaling events are triggered. Or you can run aws autoscaling describe-scaling-activities

Browse to the Cloud Watch interface on the console and watch the CPU Alarms states changing between ALARM and OK states for both scale up and down events.

Note that by default the metrics are refreshed every 5 minutes (it can be changed to by minute intervals) and that the cooldown period of 300 seconds will ignore any state changes after an auto scaling event.

A good way to generate traffic is to use bees with machine guns which I’ve described how to use here: /load-testing-on-ec2-using-bees-with-machine-guns/

Cleanup

Attempting to terminate instances directly will not stop the auto scaling. Instead you need to change the min and max size to 0 in the auto scaling group, any running instances will be terminated:

Then remove the auto scaling group and launch configurations:

Check that they have all been deleted:

The scaling policies and cloudwatch metric alarms get deleted automatically.

Conclusion

There are many other options available to configure auto scaling, here we’ve shown the basics using web servers. Auto scaling can be used for any kind of servers, such as application servers running inside a private VPC and using an internal load balancer to distribute the traffic from the web servers.

There are many metrics to choose from to create the policy alarms and you can also create your own ones.

Auto scaling can also be configured using a crontab policy, instead of having metrics launching extra instances, you can run additional instances at a certain time then terminate them after they have executed a batching job for example.

Finally use Cloud Formation templates to simplify auto scaling deployments.

Update

For those who aren’t very comfortable using the API or CLI, auto scaling support has now been added to the AWS Management Console.

It is very easy to use and configure. See the official blog post at http://aws.typepad.com/aws/2013/12/aws-management-console-auto-scaling-support.html