Basics of IaaS Spot Pricing

This is part 4 of my pricing in the cloud series.

Exactly one year ago - on Monday, December 14, 2009 - Amazon Web Services launched spot pricing as a new feature of Amazon EC2.

In a nutshell, spot pricing is a dynamic pricing scheme for EC2 instances. At each moment in time, provider sets a price for each type of instance. Users specify the maximum amount they are willing to pay for a spot instance (bid). If the amount specified by a user equals or exceeds current spot price, user gets to run the instance and pays the spot price for the instance (important - user always pays the spot price, not their bid price!); otherwise, user’s instance is terminated or user doesn’t get the instance at all until the spot price is reduced below or to a level of user’s bid. Fundamentally, from cloud provider's perspective, spot instances are a mechanism that allows cloud provider to sell its unused capacity at a discount, while retaining the right to reclaim it quickly if necessary. Let’s continue exploring spot pricing from a cloud provider’s point of view. It’s trickier than may look at first sight.

Suppose at some point in time there are N instance slots available for spot instances of some type (for example, Linux/UNIX m1.large). At the same time, there are M single-instance bids - [ B1, B2, …, Bm ], assume this array is sorted - BiBi+1.

Let’s say NM. In this case, information above is not sufficient for provider to set the spot price - it can vary quite a bit depending on optimization. If provider wants to maximize the number of running spot instances, it will set spot price at or below B1. On the other hand, if provider wants to maximize its revenue, it will set the spot price quite differently - depending on specific values of Bi. Note that which optimization is used can change at any moment.

For example, with N=10, if bids are [ 1, 2, 5, 50 ], revenue-maximizing spot price is 50; if bids are [ 1, 2, 5, 45, 50 ], it’s 45.

(https://gist.github.com/739949)

This thought exercise also shows that bids placed on the lower end of the spectrum have a very little chance of actully getting an instance. In the above example, for a bid of 2 to run, spot price must not exceed 2. But because all customers pay the spot price and not their bid, a provider stands to lose quite a bit of revenue if it were to do it - at spot 45, revenues are 90. At spot 2, revenues will be only 8 (even though more customers will get their instances).

Now let’s say N < M. In this case, no matter how you slice it, bid B1 can’t be allowed to run - hence spot price is going to be greater than B1. And this highlights another weakness of spot pricing from provider’s point of view. What if B1 = B2 = B3 = … = Bk? None of them can be allowed to run and spot price must be set at a higher level.

For example, with N=3, if bids are [ 10, 20, 30, 40 ], revenue-maximizing spot price is 20 or 30, both yield revenue of 60. But if bids are [ 10, 10, 10, 40 ], [ 20, 20, 20, 40], [ 30, 30, 30, 40 ] or even [ 39, 39, 39, 40 ], revenue-maximizing spot price is 40 with only 40 of revenue.

Therefore, a conclusion - provider benefits from diversity of bids, especially at the low end, and hence should potentially encourage it.

Consider the following extreme case as an example. Let’s say bids are [ 40, 40, 40 ] and N = 2. Spot price can’t be set at or below 40 in this case, so as a result there will be no spot instances running at all, despite available slots.

How could a provider influence bids submitted by customers, to achieve desired diversity? One way of doing it is through the use of price anchoring. One obvious anchor is a regular price for a specific product (for example, $0.085 per hour for m1.small Linux/UNIX in us-east-1). Another anchor is current spot price. I don’t know whether this technique is used and if yes, how effective it is in delivering diverse bids but I am sure a provider can measure degree of correlation between bids and current spot price (I wish I had access to this dataset…).

Spot pricing is all about random variables. Provider starts with knowing only N and setting some initial spot price. Spot price itself is a random variable to customers - no one knows if spot price is going to go up or down next, exactly when it will happen and by how much. In response to spot price, customers submit their bids - which cumulatively are a random variable to provider because it can’t know what they will be (this is most likely a lie - with a year worth of usage, I am confident Amazon has already built fairly accurate models for incoming bids). Once the bids are in, spot price is recalculated and the cycle continues.

A single spot price is set for entire region and each region has two or more availability zones. If a submitted bid exceeds a spot price and requests a specific zone which has no spot capacity, provider faces a dilemma. It could raise the spot price for entire region to the level above this bid. Or it could set individual spot prices for each zone instead of each region. Or it could make an exception and let the bid in without getting an instance until capacity becomes available.

Amazon seems to have chosen the latter approach, even though the second approach would have been cleaner but would allow users to match zones between AWS accounts.

This pretty much covers the basics. Because everyone pays the same price, spot pricing could be one of the easiest-to-implement dynamic pricing schemes. Which obviously doesn’t mean that it’s easy to implement - to the best of my knowledge, even after a full year, Amazon remains the only public IaaS cloud to offer non-static pricing. Their continued hiring efforts lead me to believe that they are not done - I am looking forward to more pricing goodness from Amazon.

Read other posts on my blog tagged amazon-ec2-spot or cloud-pricing .

Categories: cloud-computing | economics |

Unexpected Similarities in EC2 Spot Price History Between Regions

If you have been following my blog for a while, you may recall a post from earlier this year where I looked at some basic statistical properties of time series behind Amazon EC2 spot price history.

Because EC2 regions are completely separate and independent, you’d think spot price history for each region will have nothing in common. Surprise! Read on and you may change your mind.

Before I begin however, please note that prices throughout this post are once again provided in points ($1 per hour = 1,000 points).

Instead of focusing on time series aspect of spot price history, this time I decided to focus solely on prices as numbers. My hypothesis was that a set of spot prices for each product (represented by region/description/platform tuple) could be divided into 2 subsets: base subset and outlier subset. Base subset is where price would fluctuate most of the time, and outlier subset is where a price would jump in response to an extraordinary event (for example, a big user of AWS infrastructure suddenly needs a lot of capacity).

I obtained the data via API on November 29, 2010 at 4:26pm UTC - you can download the resulting dataset as JSON from https://gist.github.com/724757. Arrays with 2 items (x,y) represent ranges, arrays with 1 item (x) represent a single price.

I started looking and noticed something out of ordinary in a place where I didn’t expect to see it (see the table below).

Turns out base subsets are identical or nearly identical for all instance_type/product_description pairs across all EC2 regions except us-east. Additionally, in these regions, pricing algorithm seems to be intentionally hitting as many prices within the base subset as possible (in many cases, base subset is contiguous, without any holes - thus covering every single possible price between min and max of the base subset).

InstanceRegionBase Subset Outliers Subset
c1.medium Linux/UNIX ap-southeast-1 76-84
eu-west-1 76-84
us-west-1 76-84
c1.medium SUSE/Linux ap-southeast-1 88-98
eu-west-1 88-98
us-west-1 88-98
c1.medium Windows ap-southeast-1 159-175
eu-west-1 159-175
us-west-1 159-175
c1.xlarge Linux/UNIX ap-southeast-1 304-336 1000
eu-west-1 304-336
us-west-1 304-336
c1.xlarge SUSE/Linux ap-southeast-1 317-350
eu-west-1 316-350
us-west-1 316-350
c1.xlarge Windows ap-southeast-1 634-700 800
eu-west-1 634-700
us-west-1 634-700 750 800
m1.large Linux/UNIX ap-southeast-1 152-168
eu-west-1 152-168
us-west-1 152-168
m1.large SUSE/Linux ap-southeast-1 164-182
eu-west-1 164-182
us-west-1 164-182
m1.large Windows ap-southeast-1 254-280 480
eu-west-1 254-280
us-west-1 254-280
m1.small Linux/UNIX ap-southeast-1 38-42 95 100
eu-west-1 38-42
us-west-1 38-42
m1.small SUSE/Linux ap-southeast-1 50-56
eu-west-1 50-56
us-west-1 50-56
m1.small Windows ap-southeast-1 64-70
eu-west-1 64-70
us-west-1 64-70
m1.xlarge Linux/UNIX ap-southeast-1 304-336
eu-west-1 304-336
us-west-1 304-336
m1.xlarge SUSE/Linux ap-southeast-1 316-350
eu-west-1 316-350
us-west-1 316-350
m1.xlarge Windows ap-southeast-1 506-560
eu-west-1 506-560
us-west-1 506-560
m2.2xlarge Linux/UNIX ap-southeast-1 532-588
eu-west-1 532-588
us-west-1 532-588
m2.2xlarge SUSE/Linux ap-southeast-1 468-517
eu-west-1 468-517
us-west-1 468-517
m2.2xlarge Windows ap-southeast-1 696-770
eu-west-1 696-770
us-west-1 697-769
m2.4xlarge Linux/UNIX ap-southeast-1 1064-1112 1114 1116 1118-1176
eu-west-1 1064-1176
us-west-1 1065-1102 1104-1127 1129-1153 1155-1176
m2.4xlarge SUSE/Linux ap-southeast-1 924-946 948-965 967-977 980-987 989-1000 1002 1004 1006 1008 1010 1012 1014 1016 1018 1020
eu-west-1 924-1000 1002 1004 1006 1008 1010 1012 1014 1016 1018 1020 1022
us-west-1 925-982 985-1000 1002 1004 1006 1008 1010 1012 1014 1016 1018 1020
m2.4xlarge Windows ap-southeast-1 1394-1430 1432-1435 1437-1442 1444-1446 1449 1451 1456-1464 1466 1468-1471 1473-1487 1489-1496 1498-1540
eu-west-1 1394-1540
us-west-1 1394-1444 1446-1447 1449-1491 1493-1495 1497-1535 1537-1540
m2.xlarge Linux/UNIX ap-southeast-1 228-252
eu-west-1 228-252
us-west-1 228-252
m2.xlarge SUSE/Linux ap-southeast-1 240-266
eu-west-1 240-266
us-west-1 240-266
m2.xlarge Windows ap-southeast-1 304-336
eu-west-1 304-336
us-west-1 304-336
t1.micro Linux/UNIX ap-southeast-1 9-12 20 40
eu-west-1 9-11 15 40
us-west-1 10 20-21 40
t1.micro SUSE/Linux ap-southeast-1 15-17
eu-west-1 15-17
us-west-1 15-17 20
t1.micro Windows ap-southeast-1 15-18 21 25-26 35
eu-west-1 15-18 20-21 25 1000
us-west-1 15-17 25

Read other posts on my blog tagged amazon-ec2-spot.

Categories: cloud-computing |

Connecting to Eucalyptus Walrus (S3) with Boto and socket.gaierror

If you are trying to perform any bucket operations on Eucalyptus Walrus (S3) service from boto and are getting “socket.gaierror: Errno -2 Name or service not known”, it’s because you did not specify the proper calling_format when creating your S3Connection object.

Default calling format is SubdomainCallingFormat which will attempt to send your requests to “{bucketname}.{s3_service_host}” - in case of Walrus, most likely it won’t resolve.

What you need for Walrus is OrdinaryCallingFormat. This is actually well documented at http://open.eucalyptus.com/wiki/ToolsEcosystem_boto. You can find corresponding code here.

Categories: cloud-computing | python |

Run on the Cloud

This is part 3 of my series on pricing in the cloud.

Do you think IaaS cloud prices are due for another reduction or will they remain at current levels for now? Regardless of which answer you or I would give to this question, ours is just a guess - because we have no data to support either decision (unless you work at an IaaS cloud provider of course). What we do know for a fact though is that when a cloud provider contemplates a price reduction, they should definitely consider possibility of triggering a “run” on their own cloud.

A run on the cloud can be looked at as somewhat similar to a bank run. The topic of bank runs in the context of IaaS cloud computing was first (to my knowledge) brought up by Simon Wardley in his recent blog post. I recommend that you read his post first, and then continue here.

Fundamentally, a bank run is about customers losing faith in the institution’s ability to deliver on its promises in the future. For an actual bank, this promise is that your money is safe and that you can withdraw it at any time, whenever you need it. When some customers get nervous and withdraw their deposits, others see their actions and might get nervous too. Additionally, because bank keeps only a fraction of its deposits in reserve, the longer one waits, the less likely one will be able to get money on first request. These two phenomena lead to a domino effect that gains momentum very rapidly and is usually quite difficult to stop.

For an IaaS cloud, in this context, the promise is that if and when you need more capacity for your application (usually in the form of more cloud instances), you will be able to provision it.

What happens when a price for cloud compute capacity is reduced? Because demand curve usually slopes downwards, it means that the lower the prices, the more units (cloud instances) will be demanded and consumed. As a result, assuming overall supply remains constant, availability of instances in the future will inevitably decrease.

Here is a hypothetical situation. Try to imagine what you’d do if you were no longer confident that you’d be able to get more instances when your application ends up on the front page of Techmeme. (Note that it’s not important what could trigger the original loss of faith - maybe you would hear something at a user group meeting, maybe on Twitter, maybe on forums. A run could start due to a rumor which doesn’t even have to be true, it just has to be believed by sufficient number of people). Some might leave the cloud which is not a good outcome for provider, but will not lead to a run.

Much more importantly, others might start hoarding instances - launching more instances than actually needed, just in case a need comes up in the future. The more hoarding by some, the more likely others will run into same problem and may resort to hoarding themselves, and so on and so forth. The more hoarding is happening overall, the less “on demand” a cloud provider will become.

Am I saying that a cloud run is imminent? Of course not, in fact I am confident it’s not going to happen, because providers have usage data and can predict usage patterns with high degree with confidence. But this once again emphasizes importance of pricing in IaaS cloud computing. It’s also an illustration that a provider may not be in a position to set prices solely based on its costs or competetive pressure - there is a third powerful force that will not let prices drop too low.

Categories: cloud-computing | economics |

Blog Redesign

The new redesigned somic.org just went live - if you are reading this in RSS reader and would like to see what it looks like now, you can visit the site at http://somic.org.

One change that you may notice immediately is a new theme. I dropped Cutline and switched to Carrington Text. I feel it helps me achieve better focus on content as opposed to surrounding presentation, and it’s better aligned with my goals for this site. I found Carrington theme by searching Google for “minimalist wordpress theme” - I think you’ll agree it’s quite minimalist, which is exactly what I was after.

But much more importantly, as a part of this facelift, I replaced the underlying technology as well. I no longer run Wordpress that has been powering my blog for nearly 2.5 years. Instead, I switched to Jekyll.

Jekyll is a blogging platform developed by Github for Github Pages. Its key feature is that it doesn’t use a database. All posts and pages are pregenerated from templates and can be served by any web server as a static web site.

There are many advantages to this approach, I think. Ability to edit posts in any editor (vi), ability to use external revision tracking system (git), ability to do bulk edits if necessary (sed/awk), ability to find things faster (grep) - these are just some of them. Very fast page load times are another benefit. And finally, not having to worry about hack attempts is another.

I found Wordpress to be simply too much for my needs. Last several times I upgraded, I didn’t get any significant changes for what I was using. Wordpress is a hugely popular publishing platform for dynamic web sites. But with roughly 2 posts a month and occasional comment here and there, it was never a good fit for me. As a result of a recent uptick in pageviews, I faced a choice of whether to add a caching layer or go pre-generated route (which, if you think about it, is just like caching but on filesystem instead of in memory), and settled on the latter.

Speaking about comments. I don’t have them for now. Because I appreciate time and effort of everyone who added comments in the past, I imported existing comments into the new system. But new posts are going to have no comments for now. If I ever decide I want comments back, I will hook up Disqus or something like this. In the meantime, Twitter is an excellent way to comment on my posts and a sure way to reach me fairly quickly. Alternatively, you can email me - my address is here.

To turn Jekyll into what I wanted for my blog, I had to make several changes. Nothing drastic, only small trivial pieces. I am planning to publish them soon, if anyone is interested.

With this announcement out of the way, I will return to regular programming later this month. Stay tuned!

Categories: blogging |

Previous Page
Next Page