Dealing with Noisy Neighbors in the Cloud

This is part 2 of my series dedicated to pricing in the cloud.

As I mentioned in the past, pricing is one of the most important aspects of cloud computing offerings. Up until now, however, I have been talking about pricing only from perspective of selling the services. This post is going to be different - today I hope to show you how pricing could be used to solve a very technical issue in IaaS.

Public IaaS clouds are usually multi-tenant - your virtual machines (VMs) are running alongside other customers’ VMs, potentially on the same hardware and infrastructure such as network and storage. Cloud makes no guarantees regarding placement of your instances, and since underlying resources and access to them are shared among all VMs, it’s not unreasonable to assume that from time to time your VMs may end up with “noisy neighbors.” In this context, a “noisy neighbor” is a VM that requests or is using a disproportionately large part of some shared resource.

While it may not be completely impossible to solve this problem in hypervisor or in architecture (for example, some noisiness can be reduced by allocating a dedicated network card to each VM), there could be another way.

In economics, noisy neighbors are a typical example of a negative externality. There are many interesting things about externalities, but one of the most well known is Coase Theorem. It turns out that under certain circumstances, negative externalities could be bargained away - recipient of a negative externality could pay the party causing it to make it stop, or the party causing it could pay the recipient to stop complaining.

There is a catch however. Coasian bargaining could be hampered by transaction costs - when it’s difficult or not practical to get all parties behind a table, the theorem won’t work. In our case, it’s obvious that there are infinite transaction costs - no single user of IaaS cloud knows who her neighbors are. This would be an insurmountable challenge, unless…

Unless cloud provider itself steps up and facilitates the bargaining. Cloud provider knows exactly who each other’s neighbors are and knows exactly who is noisy (i.e., whose instances are consuming a lot of shared resources). If cloud provider agrees to act as a proxy for bargaining negotiations, it could totally work. Here is one way of doing it.

When a customer launches new instances, she could specify the amount of money for which she would be willing to get her instances terminated or moved to another location. Let’s say customer A chose $1 - that’s how much she values inconvenience of being forced to move. Customer B’s instances are neighbors of customer A’s instances, and B would like her noisy neighbor to be not as noisy (cloud provider may have to offer some sort of aggregated view into current load on shared resources, so that B can confirm that indeed her problem is a noisy neighbor, not a bug in her own code).

If B is willing to get A moved (“silenced”) for $0.5, nothing will change. But if B is willing to part with $2 in order to get A to be quiet, we get a deal. A’s instance moves to another location but she gets $1 for her trouble, B’s instance gets to enjoy more resources for $2, and cloud provider could even take a cut from the transaction (in the simplest model, cloud provider could pocket the difference). Everybody wins! There are several technicalities that would need to be taken of here, but the main point remains - a purely technical problem could be resolved with pricing mechanism (and a bit of technology).

So who is with me? Do you think we will see anything like this before the end of the year? By the end of next year? If not, why?

Categories: cloud-computing | economics |

Comments (2)

William Louth // 26 Oct 2010

I think the problem is that you are still thinking in terms of VM's.

We need adaptive software runtimes that are cost aware (whatever you would like cost to be) that predict, provision, protect, problem diagnose and govern resource consumption – in real time. This needs to be done at a much finer granularity (call, task, activity, request,…) so that we can determine what is inbound and map it to (adaptive/baselined) metering profiles. The whole software execution stack needs to expose metering (activities + meters).

Dmitriy // 26 Oct 2010

Fair enough, but I specifically qualified my post for IaaS (see first paragraph) where as of today a unit of capacity is a VM.