Parallelize Your EC2 API Calls with Python, Boto and Threading

I started a small new project on Github - http://github.com/somic/ec2-multiregion. It includes several small tools that facilitate EC2 API operations that involve multiple regions at the same time.

If I were to query each endpoint one after another, I quickly discovered it would take too long. Therefore, I created a small helper class called BotoWorkerPool (in lib/boto_worker_pool.py), which wraps Python’s standard threading module around calls to boto - this helps achieve some amount of parallelism without introducing significant complexity of dealing with and sharing data among multiple processes. This also allows to potentially migrate to processing or multiprocessing libraries in the future, which offer threading-like interfaces for a multi-process model.

There are 2 tools at the moment.

onesnapshot.py creates new snapshots for all volumes that already have one snapshot marked with "__onesnapshot__" token. The rationale for this tool came in part from the following statement on AWS main page for EBS about durability:

The durability of your volume depends both on the size of your volume and the percentage of the data that has changed since your last snapshot.

imageequiv.py takes AMI ID, kernel ID or ramdisk ID and finds equivalent IDs in all regions, based on matching name or manifest file location. This tool is a response to the following tweet of mine:

wanted - equivalence lists for kernel and ramdisk images (aki-, ari-) across all ec2 regions

Hope these are useful to someone.

Categories: cloud-computing | python |