Building Erlang R13B02-1

21 Oct 2009

This is a quick note in case anyone is having the same issue.

When building erlang R13B02-1 on a 64bit non-SMP machine (not sure if it matters), “make -j 2” somehow resulted in an error which I could not work around. Reverting to simply make (without -j 2) and starting compilation from the very beginning fixed it.

Also, after final make install, I could not start erl - it was complaining about “start.boot not found”. The solution is to symlink boot files like this:

cd /usr/lib/erlang/bin
ln -s /usr/lib/erlang/releases/R13B02/start.boot .
ln -s /usr/lib/erlang/releases/R13B02/start_clean.boot .
ln -s /usr/lib/erlang/releases/R13B02/start_sasl.boot .

I configured it with “./configure –prefix=/usr –disable-x –enable-threads –enable-kernel-poll –disable-hipe”.

Categories: erlang | linux |

Standalone Web Front Door a Must in EC2?

13 Oct 2009

Most of you have probably heard about a recent outage at BitBucket. In a nutshell, their systems hosted at AWS came under a UDP flood DDoS attack, which led to significantly increased traffic, which led to saturation of their local network interface, which led to their being unable to connect to their data stored on EBS, which led to their application becoming unresponsive.

This outage shed more light on some internal designs of EC2 itself, as described here. It might have also showcased our over-confidence in EC2’s ability to detect and defeat certain types of network attacks. But this post is about something else.

BitBucket was running their web front door and their backend application on the same instance. Front door is a part of the system which is facing the Internet and its task is to accept connections from clients. For obvious reasons, front door is running on the service's discoverable IP address - whether they used Elastic IP or not, bitbucket.org resolved to that IP. Note that front door (usually) doesn't need EBS.

Backend, however, is what needs EBS for disk persistence. At the same time, backend does not need to be publicly discoverable - as long as front door knows where its backend worker(s) is/are running, the app should be functioning just fine.

With front door and backend running on different instances, UDP flood would have saturated only the former's network interface and would have had no impact on the backend and its EBS.

I know that AWS reportedly fixed the flood issue, but looks to me like separating front door and application backend may still be a good preventive measure - after all, it’s considered a good practice for a reason.

Please note that I am not trying to accuse BitBucket of running a bad architecture and causing their own outage. All I am doing is trying to learn a lesson.

Categories: cloud-computing |

Capistrano Auth Trick

07 Oct 2009

This past summer, we needed to automate testing of several failure scenarios for VPN-Cubed. Having asked the LazyWeb about any frameworks that could help us and having gotten no response, our dev team had a short chat in the office. We decided that ultimately we were going to have to roll out our own system based on SSH. Capistrano was the obvious choice, because it’s essentially a higher-level wrapper around Net::SSH module (if you prefer python, you may take a look at fabric or paramiko).

One obstacle was that because we were emulating various failures, at times our local capistrano process, which was driving the tests, had to lose SSH connectivity to its target servers. We quickly discovered that this resulted in exception and cap process would die.

To work around this, I added yet another level on top of cap which uses GNU make (one of my all time favorites). In a nutshell, user controls the testing process via make, and make starts cap. In this case, it’s ok for cap process to occasionally exit.

But then - and we are finally getting to the point of this post - another issue came up: I didn’t want to keep typing password into cap each time it was started by make. Here is how I ended up implementing it to avoid re-typing password.

# in Makefile
USER_PASS := $(shell read -s -p "[make] user's password: " P; echo $$P )
export USER_PASS

all: set_password
# do something here

set_password:
     @test "$(USER_PASS)"

# in Capfile
set :password, lambda { ENV['USER_PASS'] ||
CLI.password_prompt("[cap] #{user}'s password: ") }

Categories: ruby |

Security Groups - Most Underappreciated Feature of Amazon EC2

21 Sep 2009

Having been developing software to run on Amazon EC2 for over a year now, I find security groups to be among its least understood and appreciated features.

Basic Usage

In short, EC2 security group (SG) is a set of ACCEPT firewall rules for incoming packets that can apply to TCP, UDP or ICMP. When an instance is launched with a given SG, firewall rules from this group are activated for this instance in EC2’s internal distributed firewall (it’s not the same as iptables on your instance!).

A common misconception is that SG rules can apply only to traffic from Internet into EC2 - this is incorrect, SGs apply to all traffic that is coming to your instance.

SG can be thought of as a security profile or a security role - it promotes good practice of managing firewall by role, not by machine. For example, you could say servers with “webapp” role must be able to connect to servers with “mysql” role on port 3306. Going further with security profile analogy, an instance can be launched with multiple SGs - similar to a server with multiple roles. Because all rules in SG are ACCEPT rules, it’s trivially easy to combine them (more on this down in my future features wishlist).

Each rule in SG (called “permission”) must specify the source of packets to be allowed. It can be either a subnet anywhere on the Internet (in CIDR notation, with 0.0.0.0/0 being entire Internet) or another security group, which once again promotes managing firewall by role. Interestingly, in the latter case, the source SG does not necessarily have to belong to your AWS account - it can be anyone’s. This makes it easy to grant selective access to your instances from instances run by your friends, partners and vendors. It works only if their instances are running in the same EC2 region (US or EU), because this functionality works only using EC2 private IP addresses.

Specifying rules with other SGs as source helps you deal with dynamic IP addressing in EC2. Without this feature, each time a new instance were launched, you would have had to adjust the SGs. It could become a mess if the application you are running in EC2 is very dynamic (scales up or down frequently). In general, if you are using IP address instead of SG name in a rule that allows certain communications to your instance from another EC2 instance in the same region, you are doing it wrong (you should be using source instance’s SG, not its IP).

To allow traffic from any EC2 instance in the same region, create a rule with source as 10.0.0.0/8 (all private IPs in EC2 so far are from this block, so this rule all not affect public IP traffic). To allow traffic from another region, you can easily find out public IPs of EC2 US and EC2 EU by launching an instance and looking up its IP in ARIN or RIPE Whois databases (note there may be multiple blocks of public IPs in use by each region).

A list of security groups with which your instance is currently running is available from inside the instance, using EC2 meta-data service (ec2-metadata -s). You can use this functionality to do some on-boot customizations based on which role this instance has or doesn’t have. Be sure to run such on-boot scripts after networking has been set up and eth0 interface is up.

Advanced Usage

I know that many folks are used to running their datacenter-based servers without local firewalls relying on protection of the network perimeter. There is no out-of-the-box perimeter in Amazon EC2 (shameless plug - third-party solutions are available). I personally highly recommend the use of local firewall in conjunction with SGs, because SGs can’t do everything (see my wishlist #1 and #3 below). Two levels of protection, instead of one, won’t hurt and should reduce probability of operator error in one of the layers leading to drastic consequences.

SG can be modified at any time using API, and modifications take immediate effect on all instances that are running with this SG. It works great for connectivity that is required occasionally. For example, you probably don’t need to have SSH open on your instances at all times. When you are about to SSH in, you can open tcp/22 and when you are done - close it. This trivially easy method will keep your instances more secure.

Additionally, note that you don’t need to have any access to your instance to adjust SGs - all SG operations are performed against EC2 API endpoint, not through your instance. There is absolutely no way to irreversibly lock yourself out of your instance - a hugely positive side effect for anyone who has ever cut off their access while trying to fix a problem.

A common task is to allow certain functionality to be called only by instances running in a specific security group. For example, check out this thread on EC2 forum. Short of re-writing the app in question to be EC2-aware, SGs offer an elegant solution. Enable requested functionality on a special network port and allow only instances from specific security group to connect to that port. Problem solved!

My Future Features Wishlist

There are several things that I would like SGs to do that it currently doesn’t do:

As of today, all outgoing and "related" packets are implicitly allowed. I hope SGs will provide some control over these in the future.
As of today, you can't attach or detach an SG to/from a running instance - a list of SGs is set at instance launch time and remains unchanged until the instance is shut down (you can add or remove rules in groups at any time, but can't modify SG membership for your instances once they are launched) - I hope this can be added in the future.
As of today, all rules in SGs are ACCEPT rules. Being able to use REJECT or DROP rules would be nice. Yes, I realize that combining multiple SGs would become tricky (because the order matters), but I think this difficulty could be addressed similar to Order directive in Apache HTTPD.
As of today, if a packet gets dropped due to SG, there is no way to find out about it - I hope something can be done about logging this information and making it available via some new API call, possibly something like A6.
Interesting things could be done if EC2 meta-data service could provide more information about other members of SGs that current instance has - I hope this could be added for easier discovery.

Conclusion

Firewall is an important subsystem of an Infrastructure as a Service cloud. With the bar set this high by Amazon EC2, I am looking forward to what other IaaS cloud implementations are planning to deliver.

There are more posts about security groups here - check them out!

Categories: cloud-computing |

On Cloud Lock-In

15 Sep 2009

I left this comment on today's post by Randy Bias titled VMWare vs Amazon... ROUND ONE... FIGHT!:

Functionality is more important, imho. As a hypothetical example, say there exists an EC2-like cloud where security groups span all regions (in EC2, as we all know, security groups are confined to a single region). Switching between EC2 and this new cloud and back for operations (start, stop, status) would be relatively easy, with help of abstraction libraries; but once you set up your architecture to use global security groups and rely on this fact when writing your app, it won’t be as easy to switch back and forth.

In other words, cloud lock-in via functionality is harder to overcome than cloud lock-in via API.

Categories: cloud-computing |

Building Erlang R13B02-1

Standalone Web Front Door a Must in EC2?

Capistrano Auth Trick

Security Groups - Most Underappreciated Feature of Amazon EC2

Basic Usage

Advanced Usage

My Future Features Wishlist

Conclusion

On Cloud Lock-In

About

Categories

Recent