Kafka is an open source message queuing solution under Apache project, Kafka is new when compared to existing queue solutions like RabbitMQ, ActiveMQ, AWS SQS on product maturity but is quickly gaining momentum due to its features. In this post we will analyze some features of Kafka to see why it is gaining attention in the market.
The demand for processing huge data sets is growing everyday across enterprise systems and data is being processed in batch or real time and the queuing systems play an important role in connecting the data from source system / producer to destination / consumers. With huge dataset in transit enterprise are looking for message solution that can provide high throughput per second , scale horizontally, provides high availability and integrate well with other solutions.
Scalability:This is one of feature where Kafka gets edge over other solutions, the ability to scale horizontally, Kafka achieves it by means of partitioning. We can set the number of partition while defining a topic (queue) and these partitions will get distributed across the broker nodes in the cluster and hence when we want to scale the system we can add more broker nodes and hence the partitions get realigned across the added broker nodes.
Fault Tolerance and High Availability:Kafka achieves high availability by means of replication the partitions get replicated across different broker nodes and Kafka uses Zookeeper for its co-ordination. When a broker node goes down zookeeper co-ordinates so that the data is continued to be served from the replicated broker node partition and hence high availability for data is achieved.
Unit of Order:Kafka guarantees unit of order delivery at each partition level and messages posted across different partitions are not guaranteed to be in order.
Reliability & Guaranteed delivery:
Kafka provides reliability to the message delivery and has options of synchronous and asynchronous acknowledgements for the message delivery.
Integration with Big Data solutions:Kafka comes as part of Hadoop distributions and integrates with Hadoop map reduce for bulk consumption in parallel, for real time stream processing needs Kafka has good integration with systems like Apache Storm and Spark.
Reference : Kafka