In the wake of finger-pointing after the recent AWS outage, software developer Ben Coe writes that a lack of total redundancy can sometimes be an acceptable risk if approached responsibly.
I want to put this out there, redundancy comes at a cost. The only way to ensure close to 100% up time is replicating your entire infrastructure.
- Infrastructure costs will more than double (there’s costs associated with the bandwidth used during replication).
- The complexity of your system will significantly increase; every shred of user-data must now be replicated, this is not trivial.
- Fail-overs in a distributed system still suck; as an example, the AWS outage of a year ago was caused by EBS volumes attempting to replicate after a catastrophic failure.
Read the Full Story.

Leave your own comment