Category: distributed

Troubleshooting

One of the areas of tech ops that doesn’t get its fair share of discussion is troubleshooting. It’s not easy to teach troubleshooting - possibly because how successfully one can troubleshoot a given system largely depends on one’s experience with the system and on quality of the system’s feedback loops...

Read more...

Normal Accidents in Complex IT Systems

Designing a fully-automated or nearly-fully-automated computer system with many moving parts and dependencies is tricky, whether a system is distributed, hyper distributed or otherwise. Failures happen and must be dealt with. After a while, most folks grow up from “failures are rare and can be ignored” to “failures are not...

Read more...

The Concept of Hyper Distributed Application

Most folks in the industry are familiar with “distributed applications.” If app components are running on multiple hosts and need to communicate with each other using network, the app is said to be distributed. Distributed applications are known for complexity of assuring all components are on the same page as...

Read more...

Eliminating Single Points of Failure - One, Two, Many

I recently reached an interesting conclusion. When you are trying to eliminate a single point of failure from your architecture, it’s almost always beneficial to first go with a 2-way redundant solution (active-passive or active-active pair, whichever is easiest to implement) and only then go to N-way, N > 2,...

Read more...

Identification Friend or Foe (IFF) in IaaS Clouds

I was recently building a distributed system which will run in Amazon EC2 cloud. It consisted of several instances of the same AMI that were going to communicate with each other using private IP addresses assigned by EC2. One interesting scenario popped up in my head. What if, after initial...

Read more...

Crash vs Connectivity Loss in Distributed Applications

Designing a distributed application to be fault tolerant is one of my favorite things that I often get to do at work. First of all, it should never fail under normal circumstances. Don’t believe people who tell you that circumstances are never normal - if it’s the case, a fault-tolerant...

Read more...