1. The Problem: Distributed Transactions in Microservices
In a monolithic application, you can wrap a complex operation in a single database transaction: either all steps commit or they all roll back. In a microservices architecture, each service has its own database. There is no single transaction that spans Order Service, Inventory Service, and Payment Service.
The naive solution is Two-Phase Commit (2PC), but it requires a distributed coordinator, locks all participants during the prepare phase, and fails catastrophically if the coordinator goes down mid-commit. In microservices, 2PC is considered an anti-pattern because it couples services tightly and severely limits availability.
2. What is a Saga?
A Saga is a sequence of local transactions. Each step performs a local database transaction within a single service and publishes an event or message to trigger the next step. If any step fails, the Saga executes compensating transactions in reverse order to undo the effects of the previously completed steps.
Sagas achieve eventual consistency rather than the strong (ACID) consistency of a database transaction. During the Saga execution, other services may observe intermediate states — for example, stock is reserved but payment has not yet been charged. Good Saga design accounts for these transient states.
3. E-Commerce Order Saga Example
ORDER SAGA — HAPPY PATH: ┌─────────────────────┐ │ 1. Create Order │ Order Service: INSERT order (status=PENDING) │ ↓ OrderCreated │ ├─────────────────────┤ │ 2. Reserve Stock │ Inventory Service: reserve items (stock -= N) │ ↓ StockReserved │ ├─────────────────────┤ │ 3. Charge Payment │ Payment Service: charge credit card │ ↓ PaymentCharged│ ├─────────────────────┤ │ 4. Ship Order │ Shipping Service: create shipment │ ↓ OrderShipped │ └─────────────────────┘ Order Service: UPDATE order status=COMPLETED ✓ ORDER SAGA — FAILURE AT STEP 3 (Payment fails): Steps 1 and 2 have committed locally. Compensating transactions (reverse order): ← Compensate Step 2: Release Stock (stock += N) ← Compensate Step 1: Cancel Order (status=CANCELLED)
4. Choreography-Based Saga
In a choreography-based Saga, there is no central coordinator. Each service listens for events and decides what to do next. Services communicate via a message bus (e.g. Kafka, RabbitMQ).
Choreography Advantage: No Single Point of Failure
Choreography is fully decentralised — no orchestrator process can fail and halt the entire workflow. Services are loosely coupled and can be deployed independently. The downside is visibility: to understand a Saga's state, you must correlate events across multiple Kafka topics. Distributed tracing (Jaeger, Zipkin) becomes essential.
5. Orchestration-Based Saga
In an orchestration-based Saga, a central Saga Orchestrator directs each service step. The orchestrator knows the entire workflow and handles failure scenarios centrally.
ORCHESTRATION FLOW:
Saga Orchestrator
│
├──▶ Order Service: "CreateOrder" ──▶ ack
│
├──▶ Inventory Service: "ReserveStock" ──▶ ack (or FAIL)
│ FAIL ──▶ Orchestrator triggers compensation:
│ Order Service: "CancelOrder"
│
├──▶ Payment Service: "ChargePayment" ──▶ ack (or FAIL)
│ FAIL ──▶ Orchestrator triggers compensation:
│ Inventory Service: "ReleaseStock"
│ Order Service: "CancelOrder"
│
└──▶ Shipping Service: "ShipOrder" ──▶ ack
DONE: Orchestrator marks Saga complete ✓
6. Choreography vs Orchestration Comparison
| Aspect | Choreography | Orchestration |
|---|---|---|
| Coordination | Decentralised (events) | Centralised (orchestrator) |
| Coupling | Loose (services know events) | Tighter (orchestrator knows all services) |
| Visibility | Hard (trace across topics) | Easy (orchestrator has full state) |
| Single point of failure | No | Yes (orchestrator must be HA) |
| Complexity with many steps | High (event spaghetti) | Manageable (centralised logic) |
| Best tools | Kafka, RabbitMQ, EventBridge | Temporal, Step Functions, Axon |
| Debugging | Hard (distributed logs) | Easy (orchestrator state machine) |
7. Compensating Transactions
A compensating transaction is not a technical rollback — it is a business operation that reverses the effects of a previously committed local transaction. Every Saga step should have a defined compensating transaction designed upfront:
- Reserve Stock → compensate: Release Stock
- Charge Payment → compensate: Refund Payment
- Create Shipment → compensate: Cancel Shipment
- Send Confirmation Email → compensate: Send Cancellation Email (not a true undo, but the closest business equivalent)
Note that some operations cannot be compensated meaningfully (e.g. you cannot "unsend" an email). In these cases, design the Saga so that irreversible operations occur last, after all reversible steps have succeeded.
Sagas Do Not Provide Isolation
Unlike database ACID transactions, Sagas do not provide isolation. Other transactions can read intermediate states. For example, a concurrent read might see an order with reserved stock but no payment. Use the "countermeasures" approach: design your data model with PENDING/RESERVED/CONFIRMED states that make partial progress visible but safe. Never expose PENDING records to end users as if they were complete.
8. Saga vs 2PC
| Property | Saga | 2PC (Two-Phase Commit) |
|---|---|---|
| Consistency model | Eventual consistency | Strong consistency (ACID) |
| Isolation | None (intermediate states visible) | Full isolation during transaction |
| Availability | High (local transactions) | Low (coordinator is SPOF) |
| Performance | High (no distributed locks) | Low (locking blocks all participants) |
| Rollback | Compensating transactions (business logic) | True atomic rollback |
| Use in microservices | Recommended | Anti-pattern |
How We Research and Update This Guide
We test the underlying formula or workflow, compare outputs with reliable references, and revise examples whenever the page content changes.
- The workflow or formula is tested directly in the tool and compared against independent reference examples.
- Examples are kept practical so readers can verify the result without hidden assumptions.
- Pages are revised whenever the interface, calculation flow, or surrounding guidance materially changes.
Frequently Asked Questions — Saga Pattern
The Saga pattern is a way to manage distributed transactions across multiple microservices. Instead of a single atomic transaction, a Saga is a sequence of local transactions — each service performs its local transaction and publishes an event or message to trigger the next step. If a step fails, the Saga executes compensating transactions to undo the previous successful steps. Sagas achieve eventual consistency without requiring a distributed lock or two-phase commit.
In choreography, services react to events — each service listens for events from other services and decides what to do. There is no central coordinator. In orchestration, a central Saga orchestrator (a dedicated service or workflow engine) directs each service to perform its local transaction and handles failures. Choreography is more decentralized and scalable but harder to trace and debug. Orchestration is easier to understand and monitor but creates a central coordinator that becomes a potential bottleneck.
Compensating transactions are the "undo" operations for each step in a Saga. If a Saga step fails, you cannot simply rollback (as you would in a database transaction) because the previous steps have already committed to different services. Instead, you execute a compensating transaction for each successfully completed step, in reverse order. For example, if "Charge Payment" succeeds but "Ship Order" fails, the compensating transaction is "Refund Payment."
Two-Phase Commit (2PC) requires a distributed transaction coordinator that locks resources across all participating services during the prepare phase. In microservices, this creates tight coupling, blocking (all services are locked while the coordinator waits for votes), and availability issues — if the coordinator fails during the commit phase, all participants are left locked indefinitely. 2PC does not work well with the independent deployment and failure model of microservices. Sagas avoid these problems by using local transactions with eventual consistency.
Use orchestration when you have complex business logic with many conditional steps, need clear visibility into transaction state, or are building a new system where traceability matters more than decoupling. Tools like AWS Step Functions, Temporal.io, and Axon Framework support orchestration. Use choreography when services are truly independent and you want loose coupling, the workflow is simple and linear, or you already have a robust event bus. Choreography becomes very hard to reason about when there are more than 4-5 services involved.
For orchestration: AWS Step Functions (managed state machines), Temporal.io (open-source workflow engine), Axon Framework (Java CQRS/Saga), and Apache Camel. For choreography: any message broker like Apache Kafka, RabbitMQ, or AWS EventBridge can power a choreography-based Saga. NestJS has a built-in Saga module. Spring Cloud has Saga support via Spring Statemachine.