Wednesday, February 25, 2009

Dynamo: Amazon’s Highly Available Key-value Store

Problem: It is difficult to create redundancy and parallelism with relational databases, so they become a single point of failure. Also, as a relational database grows, it becomes a bottleneck for the entire system. The very survival of Amazon’s business depends on common, bullet proof, flexible, and scalable software systems.

Solution: Unlike a relational database, Dynamo is a distributed storage system. Any node in the system can be issued a put or get request for any key. Dynamo is an eventually consistent storage system because if one computer updates object A, these changes need to propagate to other machines.

Physical nodes are thought of as identical and organized into a ring (built on Chord). The partitioning mechanism automatically scales as nodes enter and leave the system. Every object is asynchronously replicated to N nodes. The updates to the system occur asynchronously and may result in multiple copies of the object in the system with slightly different states. The discrepancies in the system are reconciled after a period of time, ensuring eventual consistency.

Dynamo can be tuned using just a handful parameters to achieve different, technical goals that in turn support different business requirements. Dynamo is a storage service in the box driven by an SLA. Different applications at Amazon use different configurations of Dynamo depending on their tolerance to delays or data discrepancy. Each data object is replicated across multiple nodes with timestamp based reconciliation.

Future Influence: Dynamo is a large scale implementation of a distributed storage system. The experiences from all of the companies that have built such systems out of necessity (Google, Amazon, Yahoo, etc) will prove valuable for the development of future systems.
I liked the related work section.

No comments:

Post a Comment