Eliminating Single Points of Failure

I recently reached an interesting conclusion. When you are trying to eliminate a single point of failure from your architecture, it’s almost always beneficial to first go with a 2-way redundant solution (active-passive or active-active pair, whichever is easiest to implement) and only then go to N-way, N > 2, only if necessary.

One huge difference between a pair and N-way (N>2) is how difficult it is to detect partitioning (of CAP Theorem fame - you can simultaneously achieve only two properties from the following three: data Consistency, high Availability and Partition tolerance). Assuming symmetrical communications (A can talk to B if and only if B can talk to A), partitioning detection in a pair is trivial, because there can be only one option - system A can’t talk to system B. With N>2 however, there are way more scenarios to deal with: A can’t talk to B while both A and B can talk to C, A can’t talk to B and C , etc. Additionally, communications may be restored in some random order - A may first be able to talk to B, and only some time later get its visibility to C back.

Interestingly, also from personal experience, if you manage to build a 3-way redundancy, building 4-way or even 5-way is relatively not that difficult.

There are also a couple of purely practical aspects that make a 2-way redundancy an attractive option, even if it’s going to be intermediate step before N-way is achieved. 2-way can serve as a working prototype - you can observe it, learn and analyze its failure scenarios and make sure your response to each is optimal. This can validate your approach before you sink all this time in partitioning detection for N-way.

And secondly, after you build an easier 2-way, you might as well discover that you don’t need an N-way redundancy. If a pair meets your goal (say a given percentage of service availability), you can save a lot of time and effort.

My advice - don’t skip two on your way from one to many.

Eliminating Single Points of Failure - One, Two, Many

About

Categories

Recent

Eliminating Single Points of Failure - One, Two, Many

Related:

About

Categories

Recent