Provisioning a custom Amazon VPC NAT instance with Puppet

If your VPC has a private subnet with instances which need to access the internet, then you need a NAT instance.

You may be using the default Amazon built AMI ami-vpc-nat which by defaults allows all traffic from the private instances to go out via the NAT.

But chances are you want to have greater control on the NATing rules. For example you may want to only allow the private instances to access the repositories for OS upgrades.
Also you may want to use your favorite Linux OS instead of the default Amazon Linux OS.

NATing can be configured using iptables or ufw but there’s a great Puppetlabs firewall module which makes NATing easy to configure http://forge.puppetlabs.com/puppetlabs/firewall

Initial VPC setup

Create the VPC, Security Group and Route Table as explained in the Amazon documentation at:
http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_NAT_Instance.html

Launch Instance

Launch an instance using your favorite Linux OS, in this example we’ll be using Ubuntu 12.04 LTS.

Launch it into your public subnet. Assign an Elastic IP address. Then update the Route Table as explained in the documentation link.

Important: remember to disable the Change Source / Dest Check option on the instance.

Enable IP Forwarding

To enable NATing you need to enable IP forward on the OS. This Puppet manifest snippet will enable it:

To make the change more permanent, change the setting in sysctl.conf

You can put those two manifest resources in a module called sysctl.

Install Puppet plugin

On the Puppetmaster server, install the module:

You must enable plugin sync on all Puppet agents and the master. Add the following to puppet.conf

To clear any unmanaged firewall rules on the instance, add the following to your site.pp or any similar top-scope file.

NATing Manifest

Here’s what the NATing manifest would look like if you want to allow just NATing for the ubuntu apt repositories:

Remember to include the sysctl ip_forward settings and the firewall module which are required for each firewall resource.

Conclusion

The Puppetlabs firwall modules makes it quick and easy to add new firewall rules and ensure greater security with refined NATing rules.

If your NAT instance is critical in allowing outgoing traffic for your production systems, consider implementing NAT high availability described in this blog entry: /aws-high-availability-on-nat-instances/

AWS High Availability on NAT instances

If your Amazon Web Services VPC has instances in a private subnet that requires accessing the internet, you may be using a NAT instance with the VPC route table (containing a route for 0.0.0.0/0 pointing to the NAT instance). Have you considered the possibility of this creating a single point of failure?

Consider the following scenario: you have all your instances mirrored for high availability, across two availability zones. Even with a NAT instance in each availability zone, the 0.0.0.0/0 route to the NAT instance is a single point of failure.

So how to reduce this risk? If most of your outgoing traffic consists of simple http requests, such as running upgrading Operating System packages, one solution is to use proxy servers. By using Squid (or Tinyproxy) running on an instance in the public subnet in both availability zones, then adding an internal Elastic Load Balancer, you provide high availability for your outgoing traffic.

AWS NAT High Availability with Heartbeat

However, if you need NATing for production-critical applications where a proxy server isn’t enough for outgoing traffic, AWS has come up with a simple shell script that can achieve NAT high availability. It works with the two NAT instances (one in each availability zone) pinging each other. If one doesn’t respond, the active NAT instance takes over the route for 0.0.0.0/0 in the availability zone in which the NAT has failed, then attempts to reboot the unresponsive instance. It uses the ec2 api tools and IAM roles and requires each availability zone to use its own route table for the private subnets.

This is how a high level diagram would look like:

aws_nat

The steps are described in the article at http://aws.amazon.com/articles/2781451301784570

Some Improvement tips

Ensure you execute the script with bash instead of sh otherwise the conditional expressions will fail.

It is preferable to run the script as an upstart script rather than a cronjob, so you can easily stop the service when you need to perform maintenance and reboot a NAT instance. Puppet also works better with scripts running as a service so it can ensure they are always running.

Conclusion

There are many tweaks that can be done to improve the script. AWS will reportedly be launching a highly available NAT service in the future but for now this script does the job.