Monday, December 7, 2015

NOSQL with RDBMS fallback:

NOSQL adoption becoming prominent across different critical applications to reap the benefits of performance, fault tolerance, high availability for bigger volume database needs. While migrating to NOSQL one of the risk that architects feel is what if the application gets into some unseen issues and take more time to fix , as NOSQL adoption is not battle tested across different domain and sectors and how to design some fallback strategy.

Few factors that people may think while migrating to NOSQL

  • What will happen if we get into unexpected errors in production and if takes more time to fix?
  • What if the product vendors itself haven’t faced such scenarios?
  • What if I have reporting or other dependent system that are well integrated with RDBMS and difficult to migrate from RDBMS in the current phase?
  • Architects would like to design some fallback option as RDBMS where application can switch to RDBMS on unrecoverable NOSQL issues. This raises few questions in mind on how to design the same.

  • How do I sync up data both in NOSQL and RDBMS for a high data volume without losing the order of update?
  • How do I sync up without adding much overhead to application? Synchronous update to both NOSQL and RDBMS will be too much of overhead...
  • How to reliably update the data between the two systems without any loss?
  • What if RDBMS goes down and how can I design to sync up reliably even on failures?
  • I can think of design depicted below to address the same.

    Few components involved in the design are Apache Kafka receiving the updates and Apache Storm process the data to update the same to RDBMS. Both of these system are designed to work for big data needs in a reliable and distributed form.

    Apache Kafka is a high performance message queuing system. Application pose the messages (Insert / Update / Delete) to Kafka message queue. To improve the performance with parallel processing the queue can be partitioned by table / region / logical data design as per the NOSQL model.

    Apache Storm is a real time processing engine that can consume message through Spout component, do some processing through bolts and update the data to RDBMS. Storm has topologies to process guaranteed data processing, transactional mode of commitments that makes it suitable to handle partial failures and during commitments.

    Benefits:

  • Information can be updated in an asynchronous way to RDBMS reliably.
  • Apache Kafka providing high available, reliable, partitioned queuing system fits best to handle huge data volume.
  • Storm doing real time processing for Kafka messages on the partitioned queue and provides a reliable way to update RDBMS
  • On RDBMS failures Kafka will persist the messages and Storm can continue to sync up the messages when it is comes back.
  • Wednesday, December 2, 2015

    Next Generation Enterprise Application Architecture

    New generation applications are architectured not only with the goal of desiging the application functionally and performing stable but also focuses on different aspects that are becoming critical

    Scalability – Elastic scalability for all the layers of the application including data tier

    Fault Tolerance - Ability to handle failure smartly and avoid cascading failures from and to dependent systems

    High Availability – Ability to have application highly available on all layers including database even on data center failures

    Efficient utilization of Infrastructure - Ability to scale up and down on demand

    Faster access to underlying data on high load and data volumes

    Ability to handle different data formats efficiently

    Few reasons that are tied to this evolution are the need and benefits towards cloud adoption ( could be either private or public cloud ) and the need to handle huge data volume with faster response on the data tiers

    Benefits Solutions
    Physical -> IaaS -> PaaS

    Elastic Scalability

    High Availability

    Efficient Infrastructure utilization

    Zero downtime deployment

    VMWare , Open Stack – Private Cloud IaaS

    AWS, Azure – Public Cloud IaaS , PaaS

    Cloud Foundry – PaaS on private and public cloud

    Circuit Breaker

    Fault Tolerance

    Better failure handling

    Avoid avalanche failures

    Netflix Hystrix

    Polly

    Service Registry

    Registry for dynamic instance scaling

    Netflix Eureka

    Apache Zookeeper

    Intelligent Load balancing

    Intelligent Load Balancing utilizing the elastic scaling and self-discovery

    Netflix Ribbon

    F5

    Nginix

    Search

    Quick search needs from huge data sets, full text search, pattern matching

    Elastic Search

    Solr

    Data Grid

    Faster read write data, Reduce read / write overhead to database, high availability to data

    Coherance

    Gemfire

    Membase

    Queue

    Reliable data transfer across different data layers

    Kafka

    RabbitMQ

    JMS

    NoSQL

    Big data – database needs

    Heavy Read / Write on high data volumes

    Faster response needs on the data

    High Availability on data

    Fault Tolerance on data

    Distributed database

    Scalable database

    Couchbase

    MongoDB

    HBase

    Cassandra

    Graph DB ( Titan, OrientDB )

    Hadoop

    Distributed file processing and storage ecosystem

    High speed batch (MapReduce) / real time ( Storm, Spark ) processing

    Different Hadoop distributions like Hortonworks, CloudEra,MapR