Autoscaling like a Pro on AWS

The Amazon cloud (AWS) is a great place to deploy your application to. It's flexible, cheap, you pay by-the-hour, and it offers great additional services to make your life easier. AWS also offer a great API, with properly written tools, that can act as a complete replacement for the AWS console. If you can manage something from within the console, you can also do the same thing from within the API. I'll cover the AWS tools a later on.

When it comes to handling a massive amount of traffic, we need to look at a number of component, they we look how that translates into AWS. A few key pointers to dynamically scale your application (I consider a website to be an application) :

  • Nodes need to be self-contained and stateless. All nodes should be considered equal, so no master or slave nodes. Nodes can also come and go when they please in an auto scaling setup, so if you store data on it, make sure the node is not the only source of data.

  • If you manage relational data, split up the reads from the writes. Read-only copies of relational databases on AWS are cheap, and easily setup, and even more important, scale well. All reads go to the slaves (which are read-only), all writes go to the master. As a bonus, RDS slaves can span multiple availability zones (CHECK : cross region ?).

    If possible, make the write action a service. Requests to services can be queued, and you don't have all nodes lurking the master database. It also means you can replace how it works without breaking all the nodes. AWS also offers queues as a service, where queue endpoints can be all over the AWS cloud.

    If possible, store data in a NoSQL database. If data is read-only, it's usually a good candidate. It also means you don't have to deal with ORM layers, which can be a plus (even seen the queries Hibernate generates ?)

  • Cache things. All of it. Really. If it can be cached, cache it. Memory is cheap and fast, disk access is far less fast, and network access is probably even less faster. If you can keep things in memory for even a few minutes, it will keep the response times fast. Also, a look up from memory can't go wrong, which is not something you can say from disk or network access.

  • Use API's for everything, and always assume something can fail. If it fails, handle it well. Generating a decent error message, and notifying OPS is also considered “handling well”.

Elastic Beanstalk

One of those service is Elastic Beanstalk, which allows you to deploy applications with the push of a single button on the AWS console. Elastic Beanstalk is basically a top-level front-end for a number of AWS services : Elastic Load Balancing, Auto Scaling, CloudWatch and CloudFront.

It basically gives you a load balancer, an auto scaling policy, monitoring and a chef-based infrastructure deployment tooling all combined into one service.

Since it's a generic solution, it is limited to a number of platforms :

  • IIS
  • Node.js
  • PHP
  • Python
  • Ruby
  • Tomcat
  • Docker

You have limited control over the environment you get, customization is possible with the use of .ebextensions. While it gives you some control, it's not the whole deal, but it will be sufficient for most cases.

Since we use the playframework, the options are limited to either Tomcat or Docker. The Tomcat options doesn't work with our setup (we need apache in front), so, it's down to just one option : Docker.

Docker on Elastic Beanstalk is run on an instance, with the application run inside a private subnet. So, you have the instance on a private subnet, and the application in the Docker container is run inside a NAT subnet within that container. That's not convenient if you need to handle incoming traffic outside the load-balancer.

Rolling code updates

Another show-stopper for me was the fact that it doesn't support rolling code updates. Ideally, you would deploy a new version, it would boot up instances with the new version, wait until they accept traffic, and when they do, you terminate the old instances with the old software version. That way, you would have zero downtime. Unfortunately, Elastic Beanstalk doesn't support that. So, when switching to a new version, it removes all instances with the old version,

I haven't been able to get this in a workable state. You might end up with a different situation, where it is usable. If it isn't : What's next ?

Auto scaling

Luckily for us, AWS has auto scaling. In short, you group a number of instances (the auto scaling group), and the number of instances in that group is determined by metrics collected from the instances in that group. This doesn't mean absolute numbers but the actions taken based on the metrics result in a number of instances.

Sounds.. Well. Vague. Let me give you an example.

Let's start a group with 3 instances, and a maximum of 10. Those are absolute values : The group can't drop below 3, and can't go above 10, no matter what happens. You need to provide triggers (alarms in AWS), and an action. Those two things together are called a scaling policy. To make it usable at all, you need to create a policy for scaling up, and a policy for scaling down. So, you start with 3 instances, it gets busy, and the group scales it up by one instance. Then you have 4. If that's not enough, it will add another one, etc. You can add multiple triggers, so that you can handle traffic peaks.

Scaling policies

First, you should determine what the trigger is. You can pick CPU utilization, Disk IO Operation (Disk Ops), Disk bytes read/write and network bytes in/out on EC2 instances. I usually pick the CPU utilization, because that gives me a direct relation with how busy a node is. The more CPU utilization, the busier it is, and the other way around.

So, you can pick for example 50% average utilization as a trigger, and when that gets trigger, add 1 instance to the auto scaling group. If CPU utilization drops below 30% average, you remove 1 instance for the group.

Scaling is a delicate operation : You don't want add instances to fast (it will cost money), or remove instances to fast, which will result in a higher load, which can result in a scaling policy kicking. If you create a bad scaling policy, you will have a constant flow of scaling operations, which is a bad thing. To prevent this, AWS has a cooldown period. If a scaling activity occurs, it will wait the number of seconds assigned to the cooldown period before another scaling activity will begin. Effectively, it will suspend scaling activity during that period. During the cooldown, the new instances have the time to settle down. You need to pick that number carefully, which will I cover later on.

What happens if a scaling activity occurs ? It means launching an instances from a predefined AMI. That AMI, and which instance type, is determined by the Launch configuration. Every scaling group has one launch configuration attached. Normally, when the instance is declared “Healthy”, the auto scaling group will add it to the ELB, and it will start to accept traffic. Good, assuming your instance is directly capable of handling traffic after it has booted. In many cases, it's not. You can have your application starting up, Tomcat booting, all factors that basically means you can't handle traffic directly after the instance has booted.

If your instance needs time to “warm up”, you need to tell the ELB this. This means that the ELB will do periodic checks, and when those check succeed, it will add the instance to the load balancer. As long as the check fails, it will not send traffic to that instance. You can tell the auto scaling group to use the ELB's health check instead of looking at instance healthiness. This will allow your instances to warm up, and accept traffic when they are ready. The auto scaling group will wait a certain amount of time before it starts checking, the “Health check grace period”. After that, the checks begin. There is one catch : If your instance isn't ready to accept traffic after the grace period, the auto scaling group will consider the instance unhealthy, and replace it.

Let me rephrase that : If you instance isn't healthy after the grace period + (Unhealthy Threshold * interval), your instance will be axed and replaced. So, pick this value carefully. Very carefully.

So, at this point we have autoscaling. We need to be able to rollout code without downtime

Code rollouts

You usually want to rollout a new version of your application without bringing it down. You also want to control how this is done, instead of an automated system doing this for you. In my case, I've decided to use some nice AWS properties to accomplish this :

  • If you attach an ELB to an autoscaling group, it will direct traffic to the ELB to all healthy instances in the autoscaling group
  • You can tell the autoscaling group to use the ELB health check, instead of the EC2 health check
  • You can attach an ELB to multiple autoscaling groups

Combining all the above facts, it roughly functions as following :

Every application has two autoscaling groups: One which hold the current version, with a number of running instance, the other one is a target for the new application (with a new launch configuration), and that has no running instances.

In pseudo code, it ends up with :

check_ami_id $ami_id
check_ami_tags $ami_id
determine_app_version $ami_id
find_as_group_name $appname
check_elb_health $asg_current
copy_current_lc $appname $ami $lc_old
assign_lc_to_asg asg_target $lc_new
suspend_asg $asg_current
scale_asg asg_current $asg_target
wait_until_instance_cnt_reached $asg_target $asg_current
wait_until_healthy asg_target $instance_cnt
resume_asg $asg_current
scale_asg_down $asg_current
wait_until_terminated $asg_current

Hey !! This looks like Bash. Well, it actually is. Actual implementation is done in Bash 4, and with some minor work, should also be able to support Bash 3.

I've done this for a number of reasons :

  • I wanted to upscale my Bash knowledge
  • It's installed by default on all AWS instances we run, and also with a recent enought version
  • No external dependencies to worry about : We use tools that are installed by default

I'll shortly describe each function :

check_ami_id : this checks if the AMI id is correct. This is just name-only.

check_ami_tags : This checks if the AMI has the correct tag set. It wants a
tag named 'autodeploy', with the application name as it's value. This
prevents the wrong application ending up on production, because someone
picked the wrong AMI by accident.

determine_app_version: This pick the version tag from the AMI, and display

find_as_group_name: This finds the two auto scaling groups. It needs exactly
2, anything else is considered an error. It finds the two groups by looking
at it tags, where it expects to find a tag named autodeploy, with the name
of the application as value. How the groups are actually named isn't
relevant. It then fills asg_current and asg_target by looking at how many
instances are running. If both have running instances, it is considered a
fatal error.

check_elb_health: This check the ELB attached to the auto scaling group.
All instances registered on the ELB need to be healthy. If this is not the
case, this is considered a fatal error. We do this to prevent the number of
instances to be determined wrongly, since that number is needed later on to
see if the new auto scale group has scaled up.

copy_current_lc: This copies the launch configuration of the running auto
scaling group to a new one, with the new AMI attached. You can't edit a
launch configuration, so the only option is to create a new one.

assign_lc_to_asg: Assign the new launch configuration to the target auto
scale group.

suspend_asg: This prevents the auto scaling policies from kicking in. The
scripts depend on the number to be consistent during run time.

scale_asg: Scales up the asg_target to match the asg_current

wait_until_instance_cnt_reached: Waits until the auto scale group reaches a
certain instance count.

wait_until_healthy: Waits until at least instance_cnt in the auto scale group
are healthy (as determined by the ELB)

resume_asg: Resumes the auto scaling policies

scale_asg_down: Scales down the auto scale group to zero

wait_until_terminated: Waits until all instances have terminated

The scripts use the AWS cli exclusively, together with some standard *NIX tools : readlink, wget, basename, grep, awk, sed The AWS cli is an one size fits all tooling : You can use it to control all aspects of AWS. Most commands offer filtering, and the output format can be switched to either JSON, a table or text.

text is a tab delimited format, which is ideal as an input for bash, since it operated on tab delimited input by default. While I know that json can also be handled from within bash, it depends on external tooling, or some really ugly workarounds. Since it gives no advantage in this case, I've chosen the text format.


First, make sure you have it configured. You need to set AWS_ACCESS_KEY and AWS_SECRET_KEY before you can use it. To test, run

aws ec2 describe-regions --region=eu-west-1

If you get a list of regions : OK, if not, you need to check your setup. The output will look like

    "Regions": [
            "Endpoint": "",
            "RegionName": "eu-central-1"
            "Endpoint": "",
            "RegionName": "ap-southeast-1"

To use the tab separated format use

aws ec2 describe-regions --region=eu-west-1 --output=text

and it will get you

REGIONS  eu-central-1
REGIONS    ap-southeast-1

The same data, different format. The AWS tools contain two important features :

  • Filtering. Limit the output to match the filter specified
  • Query : Limit the output to selected parts

Let me explain both of them to you. Take for example

aws ec2 describe-tags --region=eu-west-1 --filters
"Name=resource-id,Values=ami-bcad1234" --output text

This will display the AMI ami-bcad1234 if that exists, else the output will be empty. You can have multiple of them :

aws ec2 describe-tags --region=eu-west-1 --filters
"Name=resource-id,Values=ami-bcad1234" "Name=key,Values=autodeploy"
--output text

Will display AMI ami-bcad1234 of they have a tag with a key named autodeploy. Matches will only be displayed if both conditions are met. Some command give you tons of output. In most cases, you're not interested in all of them. Getting all of the output makes filtering the desired information harder, and if fields get added or removed in the future, your script might break on them.

Assume we want to query the load balancer attached to an autoscaling group. We are not interested in the rest, just the name of the autoscaling group. information. To see what you get by default, run

aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names
"name of auto scaling group" --region=eu-west-1

Pick the right region, and the right name. You'll get a lot of output, but we just want to get the load balancer name from that output. Here is where query comes in. It gives you a way to tell the CLI what you want, and that you are not interested in the rest.

To just get the ELB name, use

aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names
"name of auto scaling group" --query
"AutoScalingGroups[0].LoadBalancerNames[0]" --region=eu-west-1

So, what does this do ? It asks for the first member from the LoadBalancerNames array, and for the first LoadBalancer. You can also ask for multiple values :

aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names
"name of auto scaling group" --region=eu-west-1 --query
{HS:HealthStatus,Id:InstanceId,LS:LifecycleState}" --output text

This gives you for all members in the AutoScalingGroups array, the Instances member, and the HealthStatus, InstanceId and LifecycleState members. The CLI tools requires you to ID each field.

Wrapping it up

Now that we have the autoscaling setup and deploy in place, that leaves us with one thing : The AMI. Or better we need a way to create them. Traditionally, we launch an existing AMI, make the changes we need, and create a new AMI from the modified existing AMI.

While this works, it's also very time consuming, not to mention error-prone. Luckily, we have a solution for this : packer. It's a image creation tool on steroids.

You start out with a template (a JSON file), let the builder create an AMI from the template, using a one or more provisioners. I used two : a list of assets that get uploaded to the instance, and a bunch of shell scripts to do the actual work. While it can use fancy stuff like Ansible, Chef, Puppet and Salt, I personally are way more comfortable with bash, and it's also easier for colleagues to understand.

In general, I advise for two templates : A base one, with for example a JDK and all need tools, and one for the application. That saves us time since we don't have to install the JDK with every build, and items in the base AMI is pretty much fixed.

The application AMI handles installation of the application, with the mention base AMI as source. This way, you can build you AMI with a short amount of time, and with a consequent result : You can't forget stuff. It builds it the same time over and over again.

While we have a complete set now, on my personal TODO list is a git hook, so it builds a new AMI with every tag that get's pushed.

Questions / comments ?

Have any questions / comments ? My contact data is on the bottom of each page. I'm also pretty easy to find using Google. Since I'm not doing posts frequently, I welcome feedback.