Sunday, August 30, 2015

Kafka messaging system

Apache Kafka :

Kafka is an open source message queuing solution under Apache project, Kafka is new when compared to existing queue solutions like RabbitMQ, ActiveMQ, AWS SQS on product maturity but is quickly gaining momentum due to its features. In this post we will analyze some features of Kafka to see why it is gaining attention in the market.

The demand for processing huge data sets is growing everyday across enterprise systems and data is being processed in batch or real time and the queuing systems play an important role in connecting the data from source system / producer to destination / consumers. With huge dataset in transit enterprise are looking for message solution that can provide high throughput per second , scale horizontally, provides high availability and integrate well with other solutions.

Scalability:

This is one of feature where Kafka gets edge over other solutions, the ability to scale horizontally, Kafka achieves it by means of partitioning. We can set the number of partition while defining a topic (queue) and these partitions will get distributed across the broker nodes in the cluster and hence when we want to scale the system we can add more broker nodes and hence the partitions get realigned across the added broker nodes.

Fault Tolerance and High Availability:

Kafka achieves high availability by means of replication the partitions get replicated across different broker nodes and Kafka uses Zookeeper for its co-ordination. When a broker node goes down zookeeper co-ordinates so that the data is continued to be served from the replicated broker node partition and hence high availability for data is achieved.

Unit of Order:

Kafka guarantees unit of order delivery at each partition level and messages posted across different partitions are not guaranteed to be in order.

Reliability & Guaranteed delivery:

Kafka provides reliability to the message delivery and has options of synchronous and asynchronous acknowledgements for the message delivery.

Integration with Big Data solutions:

Kafka comes as part of Hadoop distributions and integrates with Hadoop map reduce for bulk consumption in parallel, for real time stream processing needs Kafka has good integration with systems like Apache Storm and Spark.

Reference : Kafka

Saturday, August 15, 2015

Build your own monitoring solution for couch base

Recently i was trying to build a monitoring solution for couch base , i followed a simple approach that worked out well, thought to share the same in this post.

    Requirements for the solution
  • Simple solution that can collect metrics from the http stats endpoint of couch base
  • Script based solution that can customized by operation team
  • Visualization dashboards
  • No additional software installation on couch base servers
    1. Solution Needs
  • An light weight app server that can collects metrics on regular interval
  • A persistence layer that stores the data
  • A visualization tool that can bind well with the persistent data
  • Solution Architecture

    Solution Highlights

  • Solution collects json stats data which has thousands of metrics as a whole and stores it to elastic search
  • NodeJS is a light weight server based on java script
  • Couch base stat endpoint exposes JSON based metrics and elastic search works well with storing JSON data
  • Kibana provides nice visualization for elastic search through different charts
  • NodeJS provides built libraries to elastic search
  • Hosting NodeJS, elastic search, Kibana are very simple, you can setup easily all of these components in few minutes through dockers
  • Elastic search is highly scalable
  • The approach can be applied for any monitoring where metrics are exposed through json format
  • Please find the reference of the solution under github