Disclaimer 1: Despite its possibly ominous name, this is NOT a network vulnerability or an attack that could lead to unauthorized access. UDP hole punching requires cooperation between two hosts, and hence can't be easily used as an attack by itself (in other words, in order to run it, you most likely must already have gained access to the hosts).
Disclaimer 2: Conclusions reached at the end of this post are my educated guesses, and may turn out to be not true. They are based on my observations and not on actual knowledge how EC2 internals are designed or implemented.
I was once working on a setup in Amazon EC2 and came across an oddity, which when coupled with my interest in EC2 security groups mechanism, turned into this post.
UDP hole punching, in a nutshell, is a technique which allows two cooperating hosts, potentially located behind NAT and/or firewalls, to establish a peer-to-peer UDP communication channel directly to each other. It’s a technique used by Skype, for example, - you can read more about it in a Wikipedia article. If two hosts start sending UDP packets to each other on pre-agreed ports, bi-directional flow of packets leads NAT devices and firewalls to think that all these packets are a part of an established communication channel.
EC2 allows a lighter form of this technique because EC2 NAT never rewrites source port of outgoing packet (recall that in EC2, NAT is always 1-to-1 such that port rewriting isn’t necessary). We know with 100% certainty that a packet we are sending with a given source port X will be seen by remote instance with the same source port.
I wrote a small Python tool (available at http://gist.github.com/224795) to test UDP hole punching and set out to discover if it could work in EC2. My expectation was that it should work. Unless explicitly noted, I used ports above 45,000 and none of security groups explicitly allowed UDP traffic on these ports.
I was able to easily punch UDP holes between any two instances using each instance's public IP address - in line with my expectation. But I hit a major snag when using private IP addresses of 2 instances in the same region (I used EC2-US) - I couldn't get it to work no matter what I tried: same availability zone, different availability zones, same security groups, different security groups, same AWS account, different AWS accounts. I even tried punching a hole over port 53 (all EC2 instances support DNS name resolution which happens over this port without an explicit corresponding rule in security groups) - no luck (EC2 DNS servers are not located on 10.0.0.0/8 where all instances reside).
The only way I could get it to work using private IPs, is to allow my UDP port in security groups of at least one of the instances. When I did this, both hosts reported success.
This observation leads to several thoughts that might help uncover some aspects of EC2 firewall’s internal design (these are all more or less educated guesses):
- You can punch a UDP hole between any 2 instances using their public IPs, even if your security groups do not allow such communication.
- Private IP traffic is treated totally differently than traffic over public IPs.
- You can punch a UDP hole on port X using private IP addresses of 2 instances in the same region only if at least one of the instances allows port X in its security groups (can be used as a test if you don't have access to query EC2 API endpoint)
- EC2 firewall somehow implements more logic than "all outgoing packets are allowed" when dealing with traffic over private IPs (if it were not the case, hole punching should have worked - see below).
- If we assume that security group rules are applied at an instance's dom0 (as makes at least some sense and as this research implies), I now suspect that all dom0 hosts have entire view of all security groups in the region and are getting real time updates when a rule is added or deleted (modification of rules is currently not supported). This in fact was contrary to my expectation - initially I thought each dom0 "subscribes" to updates for only those security groups which correspond to instances running on this dom0 and I thought this was the reason why dynamic group membership changes were not possible (say I want to move an instance from "db" security group to "webapp" security group).
To clarify: under the above assumption, in order for hole punching to NOT work, an outgoing packet from instance A must not reach dom0 of instance B - and the only way it’s possible under “all outgoing packets are allowed” policy is if dom0 of instance A knows that dom0 of instance B will block this packet and somehow takes this into consideration - which in general case can only happen if all dom0 hosts have entire view of all security groups and permissions in the region.
I would love to hear your thoughts on what could possibly explain this behavior, please let me know in the comments below.
If you liked this post, you may also be interested in Probing Ports in Remote Security Groups in EC2.