Auto Scaling with Amazon EC2

Autoscaling on AWSAuto scaling is the Amazon Web Service which can automatically run additional (or terminate) EC2 instances depending on, for example, the amount of web traffic.

A typical scenario in a web environment would be: if you have a minimum of 2 web servers up and running 24h a day across two availability zones (for high availability) and you get an unexpected increase in traffic when you launch a new product or service. The web servers may struggle to keep up with the increase in traffic and start to slow down.

The solution is to provision additional servers (EC2 instances) and distribute the incoming web requests across the group of web servers.

Later,  say at night,  when the traffic decreases, some EC2 instances can be removed as they would no longer be needed and you’d be back to running the website on the minimum of 2 servers again.

Auto scaling also helps to lower costs of running servers as you only pay for what you use, per hour.


This guide describes how to achieve basic auto scaling. In this example, we’ll be configuring auto scaling within a Virtual Private Cloud (VPC), and each of the two availability zones (here ap-southeast-2a and ap-southeast-2b) are configured with a subnet which can be reached from the internet (in the public VLAN). We’re assuming the VPC connected with an internet gateway and the subnets, have already been created

We’ll be using the new AWS command line interface, to install it:

Then we need to configure the CLI with the AWS credentials and default region:

Run complete to populate the available commands when you press tab:

The AWS CLI reference guide is accessible at

We also need the Elastic Load Balancer API, which isn’t yet covered by the CLI:

Export the Java and ELB home directories plus your credentials and default ELB region URL (or place them in your home directory .bachrc file):

To achieve auto scaling, we’ll be completing in the following order:

  1. Creating an Amazon Machine Image (AMI)
  2. Creating an Elastic Load Balancer (ELB)
  3. Creating a Simple Notification Service (SNS) topic
  4. Creating Auto Scaling configurations and policies
  5. Creating CloudWatch metric alarms

Amazon Machine Image Creation

We need to build our own custom AMI which is configured with the web server (apache2, nginx etc…) and contains the website code.

Create the image when the instance is running or stopped, provide a name and description:

An AMI identifier is returned which we’ll need later.

Elastic Load Balancer Creation

Create the load balancer which will forward http traffic to the instances on the 2 public subnets. Specify a security group for the ELB which will allow http protocol traffic on port 80 for both ingress and egress:

The DNS_NAME is returned which is the A record endpoint of the website (you can then add an Alias to the A record in Route53 DNS for

Note the name of the load balancer you created which we’ll need later.

Simple Notification Service

It’s good to get notifications by email whenever an auto scaling event has been triggered, this is achievable by creating an SNS topic:

It returns an Amazon Resource Name (ARN) which we need to subscribe to next with an email address:

Check the inbox and confirm the subscription.

Note the ARN which we’ll need later as well.

Auto Scaling Creation

There are several steps for creating and configuring the auto scaling.

Launch Configuration

First we need to configure a launch configuration where you specify the AMI (created previously), the key pair name, security group(s) (which allows incoming traffic on port 80) and finally the instance type:

Auto scaling group

Next we create the auto scaling group where we specify how many EC2 instances we want running at least at any time, the maximum of EC2 instances to run, how many we wish to start with, the load balancer name (created previously), the two availability zones, the two subnets, some ELB settings and finally a tag for the instances. Here we’ll be starting with 2 instances minimum, which will also be the desired capacity and we’ll be allowing a maximum of 8 instances to be launched when there’s a lot of load on the servers:

The health check type option specified that the ELB will be determining whether an instance is healthy/online using a 60 second wait period after the instance has been launched.

As soon as the auto scaling group has been created, the desired capacity number of instances are immediately launched into the two availability zones/subnets. You can check what auto scaling actions have been executed by running:

Auto scaling notifications

We need to tell the auto scaling group to which ARN a notification must be sent whenever a scale up/down event has happened, using the ARN previously created:

Auto scaling policies

We have 2 instances running in the group, set by the desired capacity option. We need to create two policies which will be executed when we want to scale up (scaling adjustment 1) and down (-1):

Note the two ARNs which we’ll need in the next part.

The cooldown setting instructs the auto scaling group not to perform any scaling operations for 300 seconds after one is triggered. This is to prevent many scaling activities to be executed within a short timeframe.

CloudWatch Alarms Creation

The final part is to create some alarm events which will trigger the scale up and down auto scaling policies. Cloud Watch provides several metrics such as CPU utilisation, disks utilisation, network in/out etc…

Here we’ll be using the CPU utilisation metric which is a commonly used for auto scaling; a high percentage of utilisation obviously means the instance is overloaded and needs to have load taken off.

Using the policy ARN created earlier for scaling adjustment 1, create the metric alarm which will fire when the average CPU utilisation is greater than 80% twice over a period of 5 minutes:

Then create the metric for scaling adjustment -1 which will fire when the CPU utilisation is less than 80% :

Note: the Cloud Watch metrics used for auto scaling are global averages of all instances in the auto scaling group, they are not instance specific metrics (which can be viewed separately).

Testing the auto scaling

Now that we have configured auto scaling, generate some traffic on the website, using the load balancer A record (or alias) and watch the magic happen !

You will be notified by email when auto scaling events are triggered. Or you can run aws autoscaling describe-scaling-activities

Browse to the Cloud Watch interface on the console and watch the CPU Alarms states changing between ALARM and OK states for both scale up and down events.

Note that by default the metrics are refreshed every 5 minutes (it can be changed to by minute intervals) and that the cooldown period of 300 seconds will ignore any state changes after an auto scaling event.

A good way to generate traffic is to use bees with machine guns which I’ve described how to use here: /load-testing-on-ec2-using-bees-with-machine-guns/


Attempting to terminate instances directly will not stop the auto scaling. Instead you need to change the min and max size to 0 in the auto scaling group, any running instances will be terminated:

Then remove the auto scaling group and launch configurations:

Check that they have all been deleted:

The scaling policies and cloudwatch metric alarms get deleted automatically.


There are many other options available to configure auto scaling, here we’ve shown the basics using web servers. Auto scaling can be used for any kind of servers, such as application servers running inside a private VPC and using an internal load balancer to distribute the traffic from the web servers.

There are many metrics to choose from to create the policy alarms and you can also create your own ones.

Auto scaling can also be configured using a crontab policy, instead of having metrics launching extra instances, you can run additional instances at a certain time then terminate them after they have executed a batching job for example.

Finally use Cloud Formation templates to simplify auto scaling deployments.


For those who aren’t very comfortable using the API or CLI, auto scaling support has now been added to the AWS Management Console.

It is very easy to use and configure. See the official blog post at

Multiple IPs and ENIs on EC2 in a VPC

aws-logoBack in 2012, Amazon Web Services launched support for multiple private IP addresses for EC2 instances, within a VPC.

This is particularly useful if you host several SSL websites on a single EC2 instance, as each SSL certificate must be hosted on it’s own (private) IP address. Then you can associate the private IP address with an Elastic IP address to make the SSL website accessible from the internet.

Multiple IPs and Limits

This AWS blog entry briefly describe the multiple IPs management:

When you create a VPC, you are by default limited to 5 elastic IP addresses. However it is easy to request for an increase by completing this form

Note that a single Elastic Network Interfaces (ENI) can have multiple secondary IP addresses, for example on a m1.small instance type, you can have up to 4 IPs, which in Linux would be the eth0, eth0:0, eth0:1 and eth0:2 interfaces.

There is also a limit on the number of ENIs and IPs for each instance type, see the documentation at:

Asymmetric Routing

When you add a second ENI, the AWS documentation is missing a fundamental note on how to configure the instance O.S. for handling the network routes.

If you attach the second ENI, associate it with an Elastic IP and bring it up (with ifup) in Linux after adding to /etc/network/interfaces, your network will very likely be performing asymmetric routing. Try and ping the Elastic IP of eth1, you get no response. This is because the response packets leaving the instance do not get sent out via the correct gateway.

Asymmetric routing is explained in depth in this article

Route configuration with additional ENIs

The fix is to add additional routes for the new ENIs. This guide assumes that so far you have followed this documentation for adding a second ENI

We’re assuming the instance has an interface eth0 with the private address from a subnet and we want to add an ENI using a different subnet with an IP address of

The /etc/network/interfaces file should look like this after adding eth1:

Then bring up eth1 interface:

Let’s check the route:

There is one default gateway at (which is bound to VPC the internet gateway) and will route any traffic from eth0. However any traffic from eth1 with a destination outside of will be dropped, so we need to re-configure the routing to the default gateway for the subnet.

Firstly, add an entry “2 eth1_rt” to the route table:

Next we need to add a default route to the gateway for eth1:

Verify that the route is added:

Finally we need to add a rule which will tell the route table to route traffic with a source of via the rt_eth1 table:

Verify that the rule is added:

Now from your machine, try and ping the Elastic IP associated with eth1 and it should now work, asymmetrical routing has been fixed !

To make the route changes permanent so that they can survive a reboot, add them to the interfaces file:

If you wish to associate an private IP from the subnet to eth1 (same subnet as eth0 network), just replace the gateway and subnet values to and respectively.

Provisioning a custom Amazon VPC NAT instance with Puppet

If your VPC has a private subnet with instances which need to access the internet, then you need a NAT instance.

You may be using the default Amazon built AMI ami-vpc-nat which by defaults allows all traffic from the private instances to go out via the NAT.

But chances are you want to have greater control on the NATing rules. For example you may want to only allow the private instances to access the repositories for OS upgrades.
Also you may want to use your favorite Linux OS instead of the default Amazon Linux OS.

NATing can be configured using iptables or ufw but there’s a great Puppetlabs firewall module which makes NATing easy to configure

Initial VPC setup

Create the VPC, Security Group and Route Table as explained in the Amazon documentation at:

Launch Instance

Launch an instance using your favorite Linux OS, in this example we’ll be using Ubuntu 12.04 LTS.

Launch it into your public subnet. Assign an Elastic IP address. Then update the Route Table as explained in the documentation link.

Important: remember to disable the Change Source / Dest Check option on the instance.

Enable IP Forwarding

To enable NATing you need to enable IP forward on the OS. This Puppet manifest snippet will enable it:

To make the change more permanent, change the setting in sysctl.conf

You can put those two manifest resources in a module called sysctl.

Install Puppet plugin

On the Puppetmaster server, install the module:

You must enable plugin sync on all Puppet agents and the master. Add the following to puppet.conf

To clear any unmanaged firewall rules on the instance, add the following to your site.pp or any similar top-scope file.

NATing Manifest

Here’s what the NATing manifest would look like if you want to allow just NATing for the ubuntu apt repositories:

Remember to include the sysctl ip_forward settings and the firewall module which are required for each firewall resource.


The Puppetlabs firwall modules makes it quick and easy to add new firewall rules and ensure greater security with refined NATing rules.

If your NAT instance is critical in allowing outgoing traffic for your production systems, consider implementing NAT high availability described in this blog entry: /aws-high-availability-on-nat-instances/