CAP Theorem

An exerpt from Seth Gilbert and Nancy Lynch. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News, 33(2):51–59, 2002. doi:10.1145/564585.564601. URL https://users.ece.cmu.edu/~adrian/731-sp04/readings/GL-cap.pdf

Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services
Seth Gilbert Nancy Lynch

Abstract
When designing distributed web services, there are three properties that are commonly desired: consistency, availability, and partition tolerance. It is impossible to achieve all three. In this note, we prove this conjecture in the asynchronous network model, and then discuss solutions to this dilemma in the partially synchronous model.

1. Introduction

At PODC 2000, Brewer, in an invited talk [2], made the following conjecture: it is impossible for a web service to provide the following three guarantees:

Consistency
Availability
Partition-tolerance

All three of these properties are desirable — and expected — from real-world web services. In this note, we will first discuss what Brewer meant by the conjecture; next, we will formalize these concepts and prove the conjecture; finally, we will describe and attempt to formalize some real-world solutions to this practical difficulty.

Most web services today attempt to provide strongly consistent data. There has been significant research designing ACID databases, and most of the new frameworks for building distributed web services depend on these databases. Interactions with web services are expected to behave in a transactional manner: operations commit or fail in their entirety (atomic), transactions never observe or result in inconsistent data (consistent), uncommitted transactions are isolated from each other (isolated), and once a transaction is committed it is permanent (durable). It is clearly important, for example, that billing information and commercial transaction records be handled with this type of strong consistency.

Web services are similarly expected to be highly available. Every request should succeed and receive a response. When a service goes down, it may well create significant real-world problems; the classic example of this is the potential legal difficulties should the E-Trade website go down. This problem is exacerbated by the fact that a website is most likely to be unavailable when it is most needed. The goal of most web services today is to be as available as the network on which they run: if any service on the network is available, then the web service should be accessible.

Finally, on a highly distributed network, it is desirable to provide some amount of fault-tolerance. When some nodes crash or some communication links fail, it is important that the service still perform as expected. One desirable fault-tolerance property is the ability to survive a network partitioning into multiple components. In this note, we will not consider stopping failures, though in some cases a stopping failure can be modeled as a node existing in its own unique component of a partition.

2. Formal Model

In this section, we will formally define what is meant by the terms consistent, available, and partition tolerant.

2.1 Atomic Data Objects

The most natural way of formalizing the idea of a consistent service is as an atomic data object. Atomic, or linearizable, consistency is the condition expected by most web services today. Under this consistency guarantee, there must exist a total order on all operations such that each operation looks as if it were completed at a single instant. This is equivalent to requiring requests of the distributed shared memory to act as if they were executing on a single node, responding to operations one at a time. This is the consistency guarantee that generally provides the easiest model for users to understand, and is most convenient for those attempting to design a client application that uses the distributed service. See Chapter 13 of【5†source】 for a more complete definition of atomic consistency.

2.2 Available Data Objects

For a distributed system to be continuously available, every request received by a non-failing node in the system must result in a response. That is, any algorithm used by the service must eventually terminate. In some ways this is a weak definition of availability: it puts no bound on how long the algorithm may run before terminating, and therefore allows unbounded computation. On the other hand, when qualified by the need for partition tolerance, this can be seen as a strong definition of availability: even when severe network failures occur, every request must terminate.

2.3 Partition Tolerance

The above definitions of availability and atomicity are qualified by the need to tolerate partitions. In order to model partition tolerance, the network will be allowed to lose arbitrarily many messages sent from one node to another. When a network is partitioned, all messages sent from nodes in one component of the partition to nodes in another component are lost. (And any pattern of message loss can be modeled as a temporary partition separating the communicating nodes at the exact instant the message is lost.)

The atomicity requirement (§2.1) therefore implies that every response will be atomic, even though arbitrary messages sent as part of the algorithm might not be delivered. The availability requirement (§2.2) implies that every node receiving a request from a client must respond, even though arbitrary messages that are sent may be lost. Note that this is similar to wait-free termination in a pure shared-memory system: even if every other node in the network fails (i.e., the node is in its own unique component of the partition), a valid (atomic) response must be generated. No set of failures less than total network failure is allowed to cause the system to respond incorrectly.

3. Asynchronous Networks

3.1 Impossibility Result

In proving this conjecture, we will use the asynchronous network model, as formalized by Lynch in Chapter 8 of [5]. In the asynchronous model, there is no clock, and nodes must make decisions based only on the messages received and local computation.

Theorem 1 It is impossible in the asynchronous network model to implement a read/write data object that guarantees the following properties:

Availability
Atomic consistency

in all fair executions (including those in which messages are lost).

Proof: We prove this by contradiction. Assume an algorithm A exists that meets the three criteria: atomicity, availability, and partition tolerance???. We construct an execution of A in which there exists a request that returns an inconsistent response. The methodology is similar to proofs in Attiya et al. [1] and Lynch [5] (Theorem 17.6). Assume that the network consists of at least two nodes. Thus it can be divided into two disjoint, non-empty sets: $G_1, G_2$. The basic idea of the proof is to assume that all messages between $G_1$ and $G_2$ are lost. If a write occurs in $G_1$, and later a read occurs in $G_2$, then the read operation cannot return the results of the earlier write operation.

More formally, let $v_0$ be the initial value of the atomic object. Let $\alpha_1$ be the prefix of an execution of A in which a single write of a value not equal to $v_0$...

$v_0$ occurs in $G_1$, ending with the termination of the write operation. Assume that no other client requests occur in either $G_1$ or $G_2$. Further, assume that no messages from $G_1$ are received in $G_2$, and no messages from $G_2$ are received in $G_1$. We know that this write completes, by the availability requirement. Similarly, let $\alpha_2$ be the prefix of an execution in which a single read occurs in $G_2$, and no other client requests occur, ending with the termination of the read operation. During $\alpha_2$ no messages from $G_2$ are received in $G_1$, and no messages from $G_1$ are received in $G_2$. Again we know that the read returns a value by the availability requirement. The value returned by this execution must be $v_0$, as no write operation has occurred in $\alpha_2$.

Let $\alpha$ be an execution beginning with $\alpha_1$ and continuing with $\alpha_2$. To the nodes in $G_2$, $\alpha$ is indistinguishable from $\alpha_2$, as all the messages from $G_1$ to $G_2$ are lost (in both $\alpha_1$ and $\alpha_2$, which together make up $\alpha$), and $\alpha_1$ does not include any client requests to nodes in $G_2$. Therefore in the $\alpha$ execution, the read request (from $\alpha_2$) must still return $v_0$. However, the read request does not begin until after the write request (from $\alpha_1$) has completed. This therefore contradicts the atomicity property, proving that no such algorithm exists.

Corollary 1.1 It is impossible in the asynchronous network model to implement a read/write data object that guarantees the following properties:

Availability, in all fair executions,
Atomic consistency, in fair executions in which no messages are lost.

Proof: The main idea is that in the asynchronous model an algorithm has no way of determining whether a message has been lost, or has been arbitrarily delayed in the transmission channel. Therefore if there existed an algorithm that guaranteed atomic consistency in executions in which no messages were lost, then there would exist an algorithm that guaranteed atomic consistency in all executions. This would violate Theorem 1.

More formally, assume for the sake of contradiction that there exists an algorithm A that always terminates, and guarantees atomic consistency in fair executions in which all messages are delivered. Further, Theorem 1 implies that A does not guarantee atomic consistency in all fair executions, so there exists some fair execution α of A in which some response is not atomic.

At some finite point in execution α, the algorithm A returns a response that is not atomic. Let α′ be the prefix of α ending with the invalid response.

3.2 Solutions in the Asynchronous Model

While it is impossible to provide all three properties: atomicity, availability, and partition tolerance, any two of these three properties can be achieved.

3.2.1 Atomic, Partition Tolerant

If availability is not required, then it is easy to achieve atomic data and partition tolerance. The trivial system that ignores all requests meets these requirements. However we can provide a stronger liveness criterion: if all the messages in an execution are delivered, the system is available and all operations terminate. A simple centralized algorithm meets these requirements: a single designated node maintains the value of an object. A node receiving a request forwards the request to the designated node, which sends a response. When an acknowledgment is received, the node sends a response to the client.

Many distributed databases provide this type of guarantee, especially algorithms based on distributed locking or quorums: if certain failure patterns occur, then the liveness condition is weakened and the service no longer returns responses. If there are no failures, then liveness is guaranteed.

3.2.2 Atomic, Available

If there are no partitions, it is clearly possible to provide atomic, available data. In fact, the centralized algorithm described in Section 3.2.1 meets these requirements. Systems that run on intranets and LANs are an example of these types of algorithms.

3.2.3 Available, Partition Tolerant

It is possible to provide high availability and partition tolerance, if atomic consistency is not required. If there are no consistency requirements, the service can trivially return $v_0$, the initial value, in response to every request. However it is possible to provide weakened consistency in an available, partition tolerant setting. Web caches are one example of a weakly consistent network.

maxim-ge/cap.md

Select an option

No results found