Monday, December 7, 2015

NOSQL with RDBMS fallback:

NOSQL adoption becoming prominent across different critical applications to reap the benefits of performance, fault tolerance, high availability for bigger volume database needs. While migrating to NOSQL one of the risk that architects feel is what if the application gets into some unseen issues and take more time to fix , as NOSQL adoption is not battle tested across different domain and sectors and how to design some fallback strategy.

Few factors that people may think while migrating to NOSQL

  • What will happen if we get into unexpected errors in production and if takes more time to fix?
  • What if the product vendors itself haven’t faced such scenarios?
  • What if I have reporting or other dependent system that are well integrated with RDBMS and difficult to migrate from RDBMS in the current phase?
  • Architects would like to design some fallback option as RDBMS where application can switch to RDBMS on unrecoverable NOSQL issues. This raises few questions in mind on how to design the same.

  • How do I sync up data both in NOSQL and RDBMS for a high data volume without losing the order of update?
  • How do I sync up without adding much overhead to application? Synchronous update to both NOSQL and RDBMS will be too much of overhead...
  • How to reliably update the data between the two systems without any loss?
  • What if RDBMS goes down and how can I design to sync up reliably even on failures?
  • I can think of design depicted below to address the same.

    Few components involved in the design are Apache Kafka receiving the updates and Apache Storm process the data to update the same to RDBMS. Both of these system are designed to work for big data needs in a reliable and distributed form.

    Apache Kafka is a high performance message queuing system. Application pose the messages (Insert / Update / Delete) to Kafka message queue. To improve the performance with parallel processing the queue can be partitioned by table / region / logical data design as per the NOSQL model.

    Apache Storm is a real time processing engine that can consume message through Spout component, do some processing through bolts and update the data to RDBMS. Storm has topologies to process guaranteed data processing, transactional mode of commitments that makes it suitable to handle partial failures and during commitments.


  • Information can be updated in an asynchronous way to RDBMS reliably.
  • Apache Kafka providing high available, reliable, partitioned queuing system fits best to handle huge data volume.
  • Storm doing real time processing for Kafka messages on the partitioned queue and provides a reliable way to update RDBMS
  • On RDBMS failures Kafka will persist the messages and Storm can continue to sync up the messages when it is comes back.
  • Wednesday, December 2, 2015

    Next Generation Enterprise Application Architecture

    New generation applications are architectured not only with the goal of desiging the application functionally and performing stable but also focuses on different aspects that are becoming critical

    Scalability – Elastic scalability for all the layers of the application including data tier

    Fault Tolerance - Ability to handle failure smartly and avoid cascading failures from and to dependent systems

    High Availability – Ability to have application highly available on all layers including database even on data center failures

    Efficient utilization of Infrastructure - Ability to scale up and down on demand

    Faster access to underlying data on high load and data volumes

    Ability to handle different data formats efficiently

    Few reasons that are tied to this evolution are the need and benefits towards cloud adoption ( could be either private or public cloud ) and the need to handle huge data volume with faster response on the data tiers

    Benefits Solutions
    Physical -> IaaS -> PaaS

    Elastic Scalability

    High Availability

    Efficient Infrastructure utilization

    Zero downtime deployment

    VMWare , Open Stack – Private Cloud IaaS

    AWS, Azure – Public Cloud IaaS , PaaS

    Cloud Foundry – PaaS on private and public cloud

    Circuit Breaker

    Fault Tolerance

    Better failure handling

    Avoid avalanche failures

    Netflix Hystrix


    Service Registry

    Registry for dynamic instance scaling

    Netflix Eureka

    Apache Zookeeper

    Intelligent Load balancing

    Intelligent Load Balancing utilizing the elastic scaling and self-discovery

    Netflix Ribbon




    Quick search needs from huge data sets, full text search, pattern matching

    Elastic Search


    Data Grid

    Faster read write data, Reduce read / write overhead to database, high availability to data





    Reliable data transfer across different data layers





    Big data – database needs

    Heavy Read / Write on high data volumes

    Faster response needs on the data

    High Availability on data

    Fault Tolerance on data

    Distributed database

    Scalable database





    Graph DB ( Titan, OrientDB )


    Distributed file processing and storage ecosystem

    High speed batch (MapReduce) / real time ( Storm, Spark ) processing

    Different Hadoop distributions like Hortonworks, CloudEra,MapR

    Sunday, August 30, 2015

    Kafka messaging system

    Apache Kafka :

    Kafka is an open source message queuing solution under Apache project, Kafka is new when compared to existing queue solutions like RabbitMQ, ActiveMQ, AWS SQS on product maturity but is quickly gaining momentum due to its features. In this post we will analyze some features of Kafka to see why it is gaining attention in the market.

    The demand for processing huge data sets is growing everyday across enterprise systems and data is being processed in batch or real time and the queuing systems play an important role in connecting the data from source system / producer to destination / consumers. With huge dataset in transit enterprise are looking for message solution that can provide high throughput per second , scale horizontally, provides high availability and integrate well with other solutions.


    This is one of feature where Kafka gets edge over other solutions, the ability to scale horizontally, Kafka achieves it by means of partitioning. We can set the number of partition while defining a topic (queue) and these partitions will get distributed across the broker nodes in the cluster and hence when we want to scale the system we can add more broker nodes and hence the partitions get realigned across the added broker nodes.

    Fault Tolerance and High Availability:

    Kafka achieves high availability by means of replication the partitions get replicated across different broker nodes and Kafka uses Zookeeper for its co-ordination. When a broker node goes down zookeeper co-ordinates so that the data is continued to be served from the replicated broker node partition and hence high availability for data is achieved.

    Unit of Order:

    Kafka guarantees unit of order delivery at each partition level and messages posted across different partitions are not guaranteed to be in order.

    Reliability & Guaranteed delivery:

    Kafka provides reliability to the message delivery and has options of synchronous and asynchronous acknowledgements for the message delivery.

    Integration with Big Data solutions:

    Kafka comes as part of Hadoop distributions and integrates with Hadoop map reduce for bulk consumption in parallel, for real time stream processing needs Kafka has good integration with systems like Apache Storm and Spark.

    Reference : Kafka

    Saturday, August 15, 2015

    Build your own monitoring solution for couch base

    Recently i was trying to build a monitoring solution for couch base , i followed a simple approach that worked out well, thought to share the same in this post.

      Requirements for the solution
  • Simple solution that can collect metrics from the http stats endpoint of couch base
  • Script based solution that can customized by operation team
  • Visualization dashboards
  • No additional software installation on couch base servers
    1. Solution Needs
  • An light weight app server that can collects metrics on regular interval
  • A persistence layer that stores the data
  • A visualization tool that can bind well with the persistent data
  • Solution Architecture

    Solution Highlights

  • Solution collects json stats data which has thousands of metrics as a whole and stores it to elastic search
  • NodeJS is a light weight server based on java script
  • Couch base stat endpoint exposes JSON based metrics and elastic search works well with storing JSON data
  • Kibana provides nice visualization for elastic search through different charts
  • NodeJS provides built libraries to elastic search
  • Hosting NodeJS, elastic search, Kibana are very simple, you can setup easily all of these components in few minutes through dockers
  • Elastic search is highly scalable
  • The approach can be applied for any monitoring where metrics are exposed through json format
  • Please find the reference of the solution under github

    Friday, January 23, 2015

    Two Phase Commit

    Bottlenecks in database layer

    Database has been seen as a most common place of bottleneck for performance across different tiers of the application. Few possible reasons restricting RDBMS performance

  • RDBMS not able to scale horizontally
  • Locking at row level / data page level / table level during database transactions
  • NoSQL on the rescue

    I have mentioned about NoSQL data stores in my previous blog which achieves horizontally scalability distribution . In this blog I would like cover how transactional behaviour is achieved with high performance in NoSQL

    Transactions in RDBMS

    Let us try to understand how transactions operates in RDBMS,transactions with ACID (Atomicity, Consistency, Isolation and Durability) compliance executes all the actions involved in the transaction in a single step, if all the actions succeeds it commits the changes otherwise all the changes are revoked. To achieve this locking happens across the tables and hence performance becomes bottleneck

    Let us take a simple transaction in order placement and analyse , An simple order management transaction involves two tables involving order and billing.

  • Confirm the product for the order by decrementing a count in the product catalogue
  • Confirm billing for payment
  • If payment succeeds, transaction as a whole has to be committed . If payment fails for some reason , change in product catalogue has to be revoked to original state so that it is available for others to consume. RDBMS achieves this whole process as a single step by locking these tables until transaction is completed and hence gets the ability to commit or revoke at the end of transaction, but this gives a overhead in performance as these tables gets locked and any read/ write on those are kept on hold unless stale read is enabled

    Restrictions with NoSQL

    Let us understand restrictions in NoSQL towards achieving this type of transactions

  • NoSQL provides locking at row level and not across rows, tables etc
  • With adoption of polyghot persistence and distributed transactions we may need to perform a transaction across different datastores as well
  • Two Phase Commit

    Two Phase Commit is an approach followed in NoSQL to achieve transaction like behaviour. As the name mentions transactions happens in two phases with the ability to commit or revoke the changes made in phase 1 during phase 2. The approach introduces a additional component transaction manager which helps to commit or roll back the changes made in each phase of the transaction

    Advantages with Two Phase Commit approach

  • Provides high performance with transactions
  • Ability to retry for failure portions of the transactions ( interesting )
  • Provides distributed transaction like capabilities across data stores
  • My personal experience with Two Phase Commit

    Recently I personally came to see a Two Phase Commit scenario handled in amazon for my order placement which became inspiration for this post.

    I placed an order ( a laptop desk ) in Amazon where my order placement was received and I went for sleep.Looks the payment got failed for some reason. Next day morning I got a notification to retry my payment, in this case Amazon instead of revoking the order placed ,Amazon holded the order for additional time say for a day or two and provided option to retry payment failure.

    My order confirmation

    Payment retry for my order

    I believe Amazon has implemented some form of two phase commit to achieve this, I was personally happy with the way amazon handled my payment failure as the order was not revoked and i was given retry option later to complete the order with my laptop desk was still reserved for me.

    This also opens the door for other mode of payments like cash on delivery etc.

    Few links on Two Phase Commit

  • Star Bucks Approach for Performance
  • MongoDB - how to perform 2PC
  • Saturday, January 3, 2015

    NoSQL - An Introduction


    Not Only SQL often mentioned as NoSQL provides a mechanism to store and retrieve data not through tabular format as in relational databases.

    There are different NoSQL solutions that are matured and being adopted widely Ex : Redis,Riak,HBase,Cassandra,Couchbase,MongoDB.

    It is critical to understand the concepts of NoSQL why and how NoSQL has been used for a specific application architecture because every NoSQL solution is unique in its own way and different from general RDBMS solutions.

    Need for NoSQL:

    With the explosion of web and social interactions the volume and complexity of data has grown tremendously huge, it is the need of the hour for each applications to scale seamlessly without any compromise in performance.

    If we look at RDBMS performance starts degrading at some point of data volume and complexity and applications has to think adopting various NoSQL solutions to match the growth of huge volume and complexity.

    Polyglot persistence:

    NoSQL Solutions has become more matured and enterprise data architects has started implementing NoSQL in their solutions giving a strong message that RDBMS is not the only solution to data needs.

    Problem in data persistence are unique and each problem needs specific solution to handle the scenario better. The concept of Polyglot persistence evolved to insist that application needs to use specific persistence solution to handle specific scenarios.

    The table below helps to describe some scenarios in a retail web application and how different persistence solution can help to satisfy those needs.

    Scenario Persistence solution
    User Sessions Re-dis
    Financial data RDBMS
    Shopping Cart Riak
    Recommendations Neo4j
    Product Catalog MongoDB
    Analytics Cassandra
    User Activity Logs Cassandra


    Coming out of relational mindset:

    One of the biggest problem with the adoption of NoSQL solution is to keep the people out of relational mindset. The minds of data modeling is deeply rooted with RDBMS and relational concepts.

    It will be difficult initially to conceptualize data out of relational world, but if we understand these concepts and look back at our data solutions made, many of them may not need the normalized modeling.

  • Data is not normalized.
  • Data will be duplicated.
  • Tables will be schema less and doesn’t follow a predefined pattern
  • Data can be stored in different formats like JSON, XML, audio, video etc.
  • Database may have some compromise on some attributes on ACID properties
  • Data may have some compromise on attributes like consistency.
  • CAP theorem:

    CAP theorem defines set of basic attributes for any distributed system. Understanding the dimensions of CAP theorem helps to understand any NoSQL solution better. The below diagram describes the attributes satisfied by different distributed database system on multiple server deployment environment.

    The important point to note here is that none of the distributed system can completely satisfy all the three dimensions of CAP theorem Consistency, Availability and Partition Tolerance.

    Any distributed system can a maximum 2 dimensions of CAP completely, depending on the application requirement people have to choose for the specific distributed system that suits their needs.

    It is critically important to understand the application requirements and understand where the specific NoSQL solution falls.

    ACID Compliance:

    ACID stands for Atomicity, Consistency, Isolation, Durability, these are set of properties that guarantee transactional behavior in RDBMS operations.

    RDBMS concepts that focuses more on integrity, concurrency, consistency and data validity, but many of the data needs in software applications may not be interested in these aggregation, integrity and validity or can handled in upper layers.

    Compromising any of these in database architecture may bring high performance and scalability that RDBMS is currently lagging.

    NoSQL database for example is not strictly ACID compliance where it can compromise on one of the attributes of ACID to achieve extreme scalability and performance.

    It is critically important to understand the application requirements and understand the specific NoSQL used and how the compromise is made.

    BASE versus ACID:

    NoSQL instead of adhering ACID compliance it tends to be BASE compliance in order to achieve scalability and high performance. The following are defined to be BASE attributes that NoSQL solution are trying to adopt

    • Basic Availability
    • Soft-state
    • Eventual consistency

    NoSQL Categorization based on data modeling

    • Key Value Stores Ex : Redis, Riak, Amazon Simple DB
    • Column Family Stores ( Big Tables ) Ex : Cassandra , HBase
    • Document databases Ex : CouchDB , Couchbase , MongoDB
    • Graph databases Ex : Neo4j, Titan

    Each of this NoSQL provide unique advantage on specific functionalities, selection of a specific NoSQL category is critical for the design of the application needs.

    At high level the specific NoSQL solution can be chosen based on the complexity and querying associated with the data model.

    The below diagram provides a good comparison on the different NoSQL databases.


    NoSQL based on system architecture:

    Based on the system architecture, NoSQL can be categorized into the following.

    • P2P ( Ring Topology )
    • Master Slave

    Each architecture has some pros and cons and a decision has to be made based on the needs.

    P2P ( Ring Topology ) Master Slave
    Role All Nodes carries equal role Master – Slave architecture with specific responsibilities on specific nodes
    Consistency Eventual Strong
    Write/Read Read and Write happens through all the nodes Mostly write is driven through restricted nodes
    Availability High Availability Availability is little compensated when master / Write node fails
    Data Data is partitioned across all nodes with replication Data is partitioned into multiple slave nodes with replication
    Examples Cassandra, Couch base HBase, MongoDB

    Data read / writes:

    The need of NoSQL type of solutions arrives when you tend to operate with huge volume of data and high requirements for performance towards read and writes.

    Below are the typical use cases where NoSQL databases will be used

    • Scalable databases
    • High availability and fault tolerance
    • Ever growing set of data
    • Bulk read / write operations

    Some NoSQL will be good for write intensive workloads and some are good for read intensive workloads and some are good for mixed workloads, specific analysis has to be done to decide on the NoSQL solution based on the needs.

    Other important concepts that I would like to highlight specific to any NoSQL solutions:


    Shrading is one of the important concept in NoSQL solution by which the data is partitioned horizontally across different nodes in the cluster. This means the data is split based on some logic say some a hash code and spread across different nodes.


    The data is not only partitioned by different nodes but also replicated across different cluster nodes. The replication factor will be a configuration in the solution. Replication ability gives high availability and automatic fail over when a specific node goes down.