Friday, January 23, 2015

Two Phase Commit

Bottlenecks in database layer

Database has been seen as a most common place of bottleneck for performance across different tiers of the application. Few possible reasons restricting RDBMS performance

  • RDBMS not able to scale horizontally
  • Locking at row level / data page level / table level during database transactions
  • NoSQL on the rescue

    I have mentioned about NoSQL data stores in my previous blog which achieves horizontally scalability distribution . In this blog I would like cover how transactional behaviour is achieved with high performance in NoSQL

    Transactions in RDBMS

    Let us try to understand how transactions operates in RDBMS,transactions with ACID (Atomicity, Consistency, Isolation and Durability) compliance executes all the actions involved in the transaction in a single step, if all the actions succeeds it commits the changes otherwise all the changes are revoked. To achieve this locking happens across the tables and hence performance becomes bottleneck

    Let us take a simple transaction in order placement and analyse , An simple order management transaction involves two tables involving order and billing.

  • Confirm the product for the order by decrementing a count in the product catalogue
  • Confirm billing for payment
  • If payment succeeds, transaction as a whole has to be committed . If payment fails for some reason , change in product catalogue has to be revoked to original state so that it is available for others to consume. RDBMS achieves this whole process as a single step by locking these tables until transaction is completed and hence gets the ability to commit or revoke at the end of transaction, but this gives a overhead in performance as these tables gets locked and any read/ write on those are kept on hold unless stale read is enabled

    Restrictions with NoSQL

    Let us understand restrictions in NoSQL towards achieving this type of transactions

  • NoSQL provides locking at row level and not across rows, tables etc
  • With adoption of polyghot persistence and distributed transactions we may need to perform a transaction across different datastores as well
  • Two Phase Commit

    Two Phase Commit is an approach followed in NoSQL to achieve transaction like behaviour. As the name mentions transactions happens in two phases with the ability to commit or revoke the changes made in phase 1 during phase 2. The approach introduces a additional component transaction manager which helps to commit or roll back the changes made in each phase of the transaction

    Advantages with Two Phase Commit approach

  • Provides high performance with transactions
  • Ability to retry for failure portions of the transactions ( interesting )
  • Provides distributed transaction like capabilities across data stores
  • My personal experience with Two Phase Commit

    Recently I personally came to see a Two Phase Commit scenario handled in amazon for my order placement which became inspiration for this post.

    I placed an order ( a laptop desk ) in Amazon where my order placement was received and I went for sleep.Looks the payment got failed for some reason. Next day morning I got a notification to retry my payment, in this case Amazon instead of revoking the order placed ,Amazon holded the order for additional time say for a day or two and provided option to retry payment failure.

    My order confirmation

    Payment retry for my order

    I believe Amazon has implemented some form of two phase commit to achieve this, I was personally happy with the way amazon handled my payment failure as the order was not revoked and i was given retry option later to complete the order with my laptop desk was still reserved for me.

    This also opens the door for other mode of payments like cash on delivery etc.

    Few links on Two Phase Commit

  • Star Bucks Approach for Performance
  • MongoDB - how to perform 2PC
  • No comments:

    Post a Comment