What Are the Advantages of 2pcs?

08 Apr.,2024

 

SAGA vs 2PC: An Exhaustive Exploration of Distributed Transaction Protocols

Remis Haroon

·

Follow

Published in

·

4 min read

·

Aug 16, 2023

--

SAGA vs 2PC

In the intricate world of distributed systems, ensuring data consistency across various services is a challenge that developers and architects grapple with. Two strategies, SAGA and 2PC, have risen to prominence in addressing this challenge. This article offers an exhaustive exploration of both, helping you make an informed decision for your system.

The Evolution of Distributed Systems

The rise of the internet and cloud computing has led to an exponential increase in distributed systems. These systems, where components located on networked computers communicate and coordinate to achieve a common goal, have become the backbone of modern digital infrastructure. They offer scalability, fault tolerance, and high availability. However, they also introduce complexities, especially concerning data consistency.

Historical Context: The Birth of 2PC and SAGA

The need for protocols like 2PC and SAGA arose from the challenges faced by early distributed systems. As businesses started to rely more on databases spread across multiple locations, ensuring that transactions were processed atomically became crucial.

2PC (Two-Phase Commit): The Early Solution

2PC was one of the first protocols developed to address the atomicity problem in distributed systems.

How does 2PC work?

  1. Prepare Phase: The coordinator asks all participants if they can commit. If any participant votes “no,” the transaction is aborted.
  2. Commit Phase: If all participants vote “yes,” the coordinator instructs them to commit.

Advantages of 2PC:

  • Atomicity: It guarantees that all participants either commit or abort, ensuring data consistency.
  • Simplicity: The protocol is straightforward to understand and implement.

Drawbacks of 2PC:

  • Blocking Nature: If the coordinator fails, participants might be left in an uncertain state.
  • Performance: Requires all participants to lock resources during the transaction, potentially leading to bottlenecks.

SAGA (Sequential and Asynchronous Guarded Actions): The Modern Approach

As systems grew in complexity and the need for scalability became paramount, the limitations of 2PC became evident. This led to the development of SAGA.

How does SAGA work?

  1. Local Transactions: Each service in a SAGA performs its transaction.
  2. Compensation: If a service fails to complete its transaction, the compensating transactions are executed to maintain data consistency.

Advantages of SAGA:

  • Scalability: By breaking transactions into smaller chunks, SAGA scales better than 2PC.
  • Resilience: Failures are isolated to individual transactions, preventing system-wide crashes.

Drawbacks of SAGA:

  • Complexity: Designing and implementing compensating transactions can be challenging.
  • Consistency: Ensuring data consistency requires careful design, especially in scenarios with multiple concurrent SAGAs.

SAGA vs 2PC: A Visual Comparison

I'm adding my answer in order to address the main difference between sagas and 2PC which is a consistency model.

Sagas, on the other hand, are series of local transactions, where each local transaction mutates and persist the entities along with some flag indicating the phase of the global transaction and commits the change.

Interesting description. What exactly this flag is? Is each node supposed to commit changes after the global transaction completes (and this is tracked by this flag)? And each node keeps local changes invisible to the outside until this happens? If that's the case, then how is that different from 2PC? If that's not the case, then what this flag is even for?

Generally, as far as I understand, a saga is a sequence of local transactions. If any of the nodes in the sequence fails then the flow is reversed and each node spawns a compensating transaction in the reversed order.

With this idea however we encounter several issues: the first one is what you've already noticed yourself: what if compensating transactions fail? What if any communcation at any step fails? But there's more, with that approach dirty reads are possible. Say Node1 succeeds and Node2 fails. We then issue a compensating transaction on Node1. But what if some another process reads data after Node1 was updated but before compensating transaction reverts that update? Potential inconsitency (depending on your requirements).

Generally, sagas are: eventually consistent and efficient (no global resource locking) by design. If you have full control over all nodes then saga can be made strongly consistent but that requires a lot of manual (and not obvious, e.g. communication issues) effort, and likely will require some resource locking (and thus we will lose performance). In that case why not use 2PC to begin with?

On the other hand 2PC is strongly consistent by design, which makes it potentially less efficient due to resource locking.

So which one to use? That depends on your requirements. If you need strong consistency then 2PC. If not then saga is a valid choice, potentially more efficient.

Example 1. Say you create an accounting system where users may transfer money between accounts. Say that those accounts live on separate systems. Furthermore you have a strict requirement that the balance should always be nonnegative (you don't want to deal with implicit debts) and maybe a strict requirement that a maximum amount can be set and cannot be exceeded (think about dedicated accounts for repaying debts: you cannot put more money than the entire debt). Then sagas may not be what you want, because due to dirty reads (and other consistency phenomena) we may endup with a balance outside of the allowed range. 2PC will be an easier choice here.

Example 2. Similarly you have an accounting system. But this time a balance outisde of range is allowed (whoever owns the system will deal with that manually). In that scenario perhaps sagas are better. Because manually dealing with a very small number of troublesome states is maybe less expensive then maintaining strong consistency all the time.

What Are the Advantages of 2pcs?

2PC vs Sagas (distributed transactions)