Deep Dive: Paxos Explained

The Patriarch of Provable Consensus

Paxos, conceived by Leslie Lamport in the 1980s (though published later), stands as a landmark achievement in distributed systems. It is a family of protocols for solving consensus in a network of unreliable processors (nodes). Its significance lies in being one of the first algorithms proven to be correct for achieving agreement in asynchronous systems prone to crash failures (but not Byzantine failures). Despite its reputation for being difficult to understand, its principles have influenced many subsequent consensus algorithms and distributed systems designs.

As we previously discussed, achieving agreement is non-trivial. Paxos provides a formal framework to do so.

Abstract visualization of the Paxos algorithm's complexity and structure

Core Concepts in Paxos

Paxos operates using a set of roles that processes can take on, and it proceeds in rounds. Key concepts include:

The Paxos Protocol (Simplified)

The basic Paxos protocol (often called "Single-Decree Paxos" as it decides on one value) typically involves two main phases for a given proposal round:

Phase 1: Prepare/Promise

  1. Prepare Request: A Proposer chooses a new proposal number *n* and sends a "prepare" request with *n* to a majority of Acceptors. Proposer -> Acceptors: prepare(n)
  2. Promise Response: An Acceptor receives the prepare request. If *n* is greater than any proposal number it has already responded to in a promise, it promises not to accept any more proposals numbered less than *n*. If it had already accepted a proposal, it includes that proposal's number and value in its response. Acceptor -> Proposer: promise(n, [previously_accepted_proposal_number, previously_accepted_value])
Diagram illustrating Phase 1 (Prepare/Promise) of the Paxos protocol

Phase 2: Propose/Accepted

  1. Propose Request (Accept!): If the Proposer receives promises from a majority of Acceptors:
    • It chooses a value *v*. If any Acceptors reported a previously accepted value, the Proposer must choose the value associated with the highest proposal number among those reported. Otherwise, it can choose its own proposed value.
    • It then sends an "accept!" request with the chosen proposal number *n* and value *v* to the same majority of Acceptors. Proposer -> Acceptors: accept!(n, v)
  2. Accepted Response: An Acceptor receives the accept! request. If it has not already promised to consider only proposals greater than *n*, it accepts the proposal (n,v) and notifies Learners. Acceptor -> Learners: accepted(n, v)

A value is considered chosen (consensus reached) when a majority of Acceptors have accepted it. Learners find out about the chosen value either by hearing from Acceptors or through other Learners.

Strengths of Paxos

Challenges and Complexities

Conceptual representation of the challenges and complexities associated with Paxos

Despite its complexities, Paxos laid the groundwork for many distributed systems. Its sibling algorithm, Raft, was later designed with understandability as a primary goal, addressing one of Paxos's main criticisms.

Further exploration into how systems ensure reliability can be found in topics such as Chaos Engineering, which tests the resilience Paxos aims to provide.