When is a Cloud Not a Cloud

29 Jul 2008

As “cloud computing” is gaining on its rivals in the race for the hottest IT buzzword du jour, I started noticing that many products that only yesterday were marketed as “hosting,” “grid” or “cluster” became “clouds” virtually overnight.

I realized something the other day. For something to call itself a computing cloud it is necessary, but not sufficient, for it to have no computing resources assigned permanently to a particular user and offer all resources to its users from a shared pool of resources. The permanent assignment in this context can usually be associated with words “provisioning” or “signup.” If a customer signup process includes or is immediately followed by a provisioning step (even if it happens behind the scenes), it’s not a cloud. This means that your regular hosting is not a cloud - you get a physical box, or a part of physical box, and it’s yours till it goes down or you stop using the service. Amazon EC2 on the other hand is definitely a cloud -your instance runs somewhere, and as long as it is running, this resource is yours. Once you shut it down, this resource goes back to the pool.

CORRECTION: Amazon EC2 satisfies this condition. But based on this info alone, technically speaking, it's impossible to say if it's a cloud or not, because I only provided a necessary condition.

Categories: cloud-computing |

Selecting "FIELD AS NAME" in ActiveRecord Always Returns String

17 Jul 2008

If I select “field as alt_field_name”, the value will always come out as String, not what the field actually is.


>> User.find(:first, :select => :created_at).created_at
=> Wed Jan 24 00:04:59 UTC 2007
>> User.find(:first, :select => "created_at").created_at
=> Wed Jan 24 00:04:59 UTC 2007
>> User.find(:first, :select => "created_at as created__at").created__at
=> "2007-01-24 00:04:59"

Categories: ruby |

Raising Money for Ivan - Please Help

14 Jul 2008

Most of us often get donation solicitations in mail. When it’s for children, these solicitations often feature a story of one child who needs help. However, if one donates, the money always goes to the organization, not to this particular child. So you never know if that child got the needed treatment and you don’t know the results.

You now have a chance to DIRECTLY help someone, a real person, not organization. Andrii and I together attended Kiev Polytechnic Institute, together worked on lab assignments, played volleyball and Preferans. Andrii’s son Ivan needs your help. You can conveniently donate via PayPal at http://www.mysql.com/about/help-ivan.html - any size donation matters!

“My family got bad news - doctors said allogenic bone marrow transplantation is the only chance for my son Ivan.

“8 months of heavy and expensive immune suppression brought some positive results so we hoped that recovering is just question of time.

“Ivan is very brave boy - not every human meets so much suffering during whole life, like Ivan already met in his 2,5 years. But long road is still in front of us to get full recover - we are ready to come it through.

“Ukrainian clinics have no technical possibility to do such complex operation, so we need 150-250K EUR for Israel or European or US clinic. The final decision will be made considering amount we able to find. Perhaps my family is able to get ~60% of that by selling the flat where parents leave and some other goods, but we still require external help.”

– Andrii Nikitin, MySQL Engineer

Please join us in raising money for Ivan - donate today.

Categories: uncategorized |

Operations Alerts and Tragedy of The Commons

11 Jul 2008

Today I would like to continue my never ending quest of finding parallels between IT and economics and social sciences. I will start with a preamble, but if you are already familiar with a concept of “operations alert” in context of IT, you can skip it.

Preamble

I have spent a big part of my career in technology operations of small, medium and huge companies, so a concept of “operations alert” is very dear to my heart. For those who are not familiar with this concept, operations alert is an automated message about something in your IT environment or infrastructure that went wrong. For example, a server crashed or application stopped responding. Some people call these things “alarms” instead of alerts.

These messages can take many forms. When a company is small, it almost always starts with alerts sent out as email messages or SMS. Later on, as the number of alerts sent and analyzed each day grows, companies usually deploy a dedicated system that centralizes, aggregates and presents the alerts in a more manageable way. It’s usually a client-server architecture, where clients are monitoring agents deployed on all or most machines, that send the information to a central server for processing. Or sometimes there are no agents, and server regularly performs checks (sends probes, also sometimes called active monitoring) of network services and generates alarms off of responses (or lack thereof). Examples of open source solutions in this area are Big Brother (and clones/descendants), Hyperic, OpenNMS, Nagios, Zabbix, Zenoss and many others.

When organization gets a ton of alerts each day, it needs to prioritize them. And a concept of “alert severity” is born. It’s usually one of “critical”, “major”, “minor”, “warning”, “info” and “debug”. The higher the severity, the more important an alert is and the sooner it needs to be analyzed. Usually, alerts are created by specialized engineers who are responsible for a particular server or application (called SME - subject matter expert), while people who receive them and react to them are generalists (engineers not focusing on a particular technology but with very broad expertise in system and network administration).

Who Sets Severity?

I looked at many tools and observed how several organizations implemented operations monitoring, and I noticed a pattern - alerts severity is set by SMEs (I was such an SME up until recently). An SME analyzes the pool of alerts that his systems can ever generate, rates them by how important they are, and assigns priorities accordingly. Generalists monitor the dashboard and supposedly react to alarms in the order of decreasing severity.

All good, right? Wrong! Enter the Tragedy of the Commons. Generalists’ attention and time are finite resource. In order for SME to get attention to alerts sent from his systems, he tends to inflate severity of his alerts to draw more attention of generalists. As a result, quite soon, all your alerts are marked “critical”. All SMEs combined would be better off if all their peers fairly assigned severity, but each individual SME is better off if they inflate the severity for alerts sent by their systems. Niiiiice!!!

Solution

I think there might be a solution to the tragedy of commons problem in IT operations monitoring after all. It’s easy to explain but difficult to implement. Your alerts should not have severity at all. In other words, when an alert message reaches central server, it should have no severity. Once an alert is received, its severity should be a function of real-time status of entire environment. One minute a fan failure on your secondary DNS server is top priority (and hence a “crit”), but next minute a network interface failure on your primary DNS becomes a much higher priority. And of course web front door outage half an hour later easliy trumps both of these problems (provided they are not related of course).

I have some ideas how this can be implemented, but not ready to write them up yet. For now, when you evaluate monitoring solutions and vendors, consider that red severity field in their nice screenshots and ask yourself if it’s going to help you achieve better operations efficiency, or lead you down the path of the tragedy of the commons.

Categories: devops | economics |

Let's Prove GigaOM Wrong On Enterprises + Clouds

03 Jul 2008

On Tuesday, GigaOM published 10 Reasons Enterprises Aren't Ready to Trust the Cloud.

I personally think the title is somewhat misleading. It would have been more appropriately named “10 Reasons Enterprises Aren’t Ready to Take Their Entire In-House IT Operations to the Cloud.” The difference is huge. Enterprises can totally trust the cloud to perform certain operations. Massive data crunching tasks that need to run occasionally are perfect for this.

There is a single reason why we don’t hear more about enterprises adopting clouds (yet!) - it’s not that easy. First of all, if we have a massive dataset to run through a computation cycle, this dataset first needs to be transferred somewhere where cluster nodes can get to it. In case of Amazon Web Services,S3 is where one could put it. But before we can transfer the data, we need to extract it from the source (database, data warehouse etc). This can be easier said than done.

Once the dataset is ready, a number of cluster nodes need to be started - you need an AMI and a private communication mechanism for your instances. You also will need discovery tools, because EC2 assigns dynamic IP addresses and without discovery, your cluster nodes will not be aware of each other. And these are only high level steps…

So, if you are an enterprise and you would like to show GigaOM that you do trust the cloud, are you on your own to make it all happen? I happen to know the answer. I work for company called CohesiveFT and our Elastic Server platform can help you in several important ways. Firstly, remember that first step of extracting the dataset from your internal system? How would you feel if I told you that you can skip this step - instead you can set up a private virtual network between your Amazon EC2 instances and your corporate datacenter so that cluster nodes can access the data source directly? If you are interested, check out our VcubeV multisourcing technique. It will also help you sort out the problem of dynamically assigned IP addresses (hint: VcubeV virtual IP addresses can be static).

Secondly, you can use Upload Your Package feature to easily embed your home-grown software to be included in every cluster node. You will save quite a lot of time if you use a nice web GUI to assemble your cluster node instead of building and bundling an AMI manually. Patch management will also be easier - consider a couple of clicks to upload a new version of software and rebuild the server, vs. repeating entire bundling process from its very beginning.

And finally, Elastic Server On Demand can launch your servers in EC2 as easily as it can build a vmware image of exactly the same stack for you to test locally.

If you are an enterprise looking for help to get started in the cloud, you now know where to find us.