Viewstamp Replication

Back to Research-Papers

The original paper is a bit confusing, this offers a more concrete explanation of implementation details.

In the paper, a view is a set of members and a leader (called a primary). Each event within a view has a unique logical timestamp. A viewstamp is the tuple of view id with some timestamp and used to detect lost information

protocol overview

It differs from Paxos in that it is a replication protocol, not consensus


Identity of the primary is encoded in the viewstamp (the totally ordered view number part)

without failure

Every message sent from one replica to another contains the current view number

Each replica holds some state:

As a part of state, there's a bunch of numbers:

  1. client sends request to primary,
  2. primary receives request, and looks up the client in its client table. If request number is strictly larger than the last from the client, then it's processed.
  3. primary increments the op-number (logical timestamp) and adds it to the log. It updates the client table. Now it proceeds with 2PC to its other replicas
  4. backups receive prepare, add the request to a priority queue, and once the all preceding requests are added to its log, it adds the new request and finally replies PREPAREOK to the primary
  5. primary waits until it has received f, and then can increase the commit number to the op-number of the committed request and can send the requested operation to the service and send a reply to the client
  6. backups learn of the commit through the next prepare or a timeout possibly, they finish all operations (maybe through state transfer) and then they send the operation up to the service layer without notifying the client (primary has done this already)

changing views in failure

primary should be sending prepare, and if not commit, messages regularily to replicas. Because of round robin, there's no leader election and all replicas know which replica is the new primary

Paper's architecture