Paxos Made Live - An Engineering Perspective

Back to Research-Papers

Paxos is an algorithm for achieving concensus on a single value in the face of failure due to properties of distributed systems. The basic algorithm is used to manage a single value, while in practice multi-paxos is more practical for systems to track multiple (possibly sequential) values

each instance of paxos has three phases: election of coordinator, broadcasting the value on an accept message, and sending commit messages once consensus is reached
consensus is reached if a majority of replicas acknowledge an accept message, and a majority guarantees that even if any minority of nodes crashes at a time, at least one node will hold the new committed value
paxos can possibly have multiple nodes who think that their the coordinator, and so coordinators have ordered numbers to determine who is newer and replicas must promise not to accept requests from older coordinators
during election, if a replica wants to be coordinator, it must send a propose message to which a majority must promise not
once consensus is reached, every future coordinator must select the same value to be committed to guarantee continued agreement
in the response to an agree message, replicas respond to the coordinator with an agreed upon value if consensus has been reached, and because of properties of majorities, it is guaranteed that at least one replica in the current majority was a part of the previous majority that reached consensus

An implementation of paxos is hundreds of times longer than its psudocode, due to the need of optimizations and features needed to be production ready

disk corruption on a replica could lead to a violation of promises made to other nodes
- needs checksums to verify corruption and markers to distinguish failure and corruption
master leases