Auto scaling is the Amazon Web Service which can automatically run additional (or terminate) EC2 instances depending on, for example, the amount of web traffic.
A typical scenario in a web environment would be: if you have a minimum of 2 web servers up and running 24h a day across two availability zones (for high availability) and you get an unexpected increase in traffic when you launch a new product or service. The web servers may struggle to keep up with the increase in traffic and start to slow down.
The solution is to provision additional servers (EC2 instances) and distribute the incoming web requests across the group of web servers.
Later, say at night, when the traffic decreases, some EC2 instances can be removed as they would no longer be needed and you’d be back to running the website on the minimum of 2 servers again.
Auto scaling also helps to lower costs of running servers as you only pay for what you use, per hour.
This guide describes how to achieve basic auto scaling. In this example, we’ll be configuring auto scaling within a Virtual Private Cloud (VPC), and each of the two availability zones (here ap-southeast-2a and ap-southeast-2b) are configured with a subnet which can be reached from the internet (in the public VLAN). We’re assuming the VPC connected with an internet gateway and the subnets, have already been created
We’ll be using the new AWS command line interface, to install it:
Then we need to configure the CLI with the AWS credentials and default region:
AWS Access Key ID [None]:
AWS Secret Access Key [None]:
Default region name [None]: ap-southeast-2
Default output format [None]: json
Run complete to populate the available commands when you press tab:
complete -C aws_completer aws
The AWS CLI reference guide is accessible at http://docs.aws.amazon.com/cli/latest/reference/
We also need the Elastic Load Balancer API, which isn’t yet covered by the CLI:
sudo unzip ElasticLoadBalancing.zip -d /opt
Export the Java and ELB home directories plus your credentials and default ELB region URL (or place them in your home directory .bachrc file):
To achieve auto scaling, we’ll be completing in the following order:
- Creating an Amazon Machine Image (AMI)
- Creating an Elastic Load Balancer (ELB)
- Creating a Simple Notification Service (SNS) topic
- Creating Auto Scaling configurations and policies
- Creating CloudWatch metric alarms
Amazon Machine Image Creation
We need to build our own custom AMI which is configured with the web server (apache2, nginx etc…) and contains the website code.
Create the image when the instance is running or stopped, provide a name and description:
aws ec2 create-image --instance-id
An AMI identifier is returned which we’ll need later.
Elastic Load Balancer Creation
Create the load balancer which will forward http traffic to the instances on the 2 public subnets. Specify a security group for the ELB which will allow http protocol traffic on port 80 for both ingress and egress:
--listener "protocol=http, lb-port=80, instance-port=80"
The DNS_NAME is returned which is the A record endpoint of the website (you can then add an Alias to the A record in Route53 DNS for www.yourdomain.com).
Note the name of the load balancer you created which we’ll need later.
Simple Notification Service
It’s good to get notifications by email whenever an auto scaling event has been triggered, this is achievable by creating an SNS topic:
aws sns create-topic --name sns-as-bluemalkin
It returns an Amazon Resource Name (ARN) which we need to subscribe to next with an email address:
aws sns subscribe --topic-arn "arn:aws:sns:ap-southeast-2:990839841794:sns-as-bluemalkin"
"SubscriptionArn": "pending confirmation"
Check the inbox and confirm the subscription.
Note the ARN which we’ll need later as well.
Auto Scaling Creation
There are several steps for creating and configuring the auto scaling.
First we need to configure a launch configuration where you specify the AMI (created previously), the key pair name, security group(s) (which allows incoming traffic on port 80) and finally the instance type:
aws autoscaling create-launch-configuration --launch-configuration-name lc-bluemalkin
Auto scaling group
Next we create the auto scaling group where we specify how many EC2 instances we want running at least at any time, the maximum of EC2 instances to run, how many we wish to start with, the load balancer name (created previously), the two availability zones, the two subnets, some ELB settings and finally a tag for the instances. Here we’ll be starting with 2 instances minimum, which will also be the desired capacity and we’ll be allowing a maximum of 8 instances to be launched when there’s a lot of load on the servers:
aws autoscaling create-auto-scaling-group --auto-scaling-group-name ag-bluemalkin
--availability-zones ap-southeast-2a ap-southeast-2b
The health check type option specified that the ELB will be determining whether an instance is healthy/online using a 60 second wait period after the instance has been launched.
As soon as the auto scaling group has been created, the desired capacity number of instances are immediately launched into the two availability zones/subnets. You can check what auto scaling actions have been executed by running:
aws autoscaling describe-scaling-activities
Auto scaling notifications
We need to tell the auto scaling group to which ARN a notification must be sent whenever a scale up/down event has happened, using the ARN previously created:
aws autoscaling put-notification-configuration --auto-scaling-group-name ag-bluemalkin
Auto scaling policies
We have 2 instances running in the group, set by the desired capacity option. We need to create two policies which will be executed when we want to scale up (scaling adjustment 1) and down (-1):
aws autoscaling put-scaling-policy --policy-name sp-up-bluemalkin
aws autoscaling put-scaling-policy --policy-name sp-down-bluemalkin
Note the two ARNs which we’ll need in the next part.
The cooldown setting instructs the auto scaling group not to perform any scaling operations for 300 seconds after one is triggered. This is to prevent many scaling activities to be executed within a short timeframe.
CloudWatch Alarms Creation
The final part is to create some alarm events which will trigger the scale up and down auto scaling policies. Cloud Watch provides several metrics such as CPU utilisation, disks utilisation, network in/out etc…
Here we’ll be using the CPU utilisation metric which is a commonly used for auto scaling; a high percentage of utilisation obviously means the instance is overloaded and needs to have load taken off.
Using the policy ARN created earlier for scaling adjustment 1, create the metric alarm which will fire when the average CPU utilisation is greater than 80% twice over a period of 5 minutes:
aws cloudwatch put-metric-alarm --alarm-name cw-up-bluemalkin
Then create the metric for scaling adjustment -1 which will fire when the CPU utilisation is less than 80% :
aws cloudwatch put-metric-alarm --alarm-name cw-down-bluemalkin
Note: the Cloud Watch metrics used for auto scaling are global averages of all instances in the auto scaling group, they are not instance specific metrics (which can be viewed separately).
Testing the auto scaling
Now that we have configured auto scaling, generate some traffic on the website, using the load balancer A record (or alias) and watch the magic happen !
You will be notified by email when auto scaling events are triggered. Or you can run aws autoscaling describe-scaling-activities
Browse to the Cloud Watch interface on the console and watch the CPU Alarms states changing between ALARM and OK states for both scale up and down events.
Note that by default the metrics are refreshed every 5 minutes (it can be changed to by minute intervals) and that the cooldown period of 300 seconds will ignore any state changes after an auto scaling event.
A good way to generate traffic is to use bees with machine guns which I’ve described how to use here: /load-testing-on-ec2-using-bees-with-machine-guns/
Attempting to terminate instances directly will not stop the auto scaling. Instead you need to change the min and max size to 0 in the auto scaling group, any running instances will be terminated:
aws autoscaling update-auto-scaling-group --auto-scaling-group-name ag-bluemalkin --min-size 0 --max-size 0
Then remove the auto scaling group and launch configurations:
aws autoscaling delete-auto-scaling-group --auto-scaling-group-name ag-bluemalkin
aws autoscaling delete-launch-configuration --launch-configuration-name lc-bluemalkin
Check that they have all been deleted:
aws autoscaling describe-launch-configurations
aws autoscaling describe-auto-scaling-groups
The scaling policies and cloudwatch metric alarms get deleted automatically.
There are many other options available to configure auto scaling, here we’ve shown the basics using web servers. Auto scaling can be used for any kind of servers, such as application servers running inside a private VPC and using an internal load balancer to distribute the traffic from the web servers.
There are many metrics to choose from to create the policy alarms and you can also create your own ones.
Auto scaling can also be configured using a crontab policy, instead of having metrics launching extra instances, you can run additional instances at a certain time then terminate them after they have executed a batching job for example.
Finally use Cloud Formation templates to simplify auto scaling deployments.
For those who aren’t very comfortable using the API or CLI, auto scaling support has now been added to the AWS Management Console.
It is very easy to use and configure. See the official blog post at http://aws.typepad.com/aws/2013/12/aws-management-console-auto-scaling-support.html