Apache Kafka is an open-source message broker written in Scala that aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds [1].
Kafka integrates with Apache Zookeeper which is a distributed configuration and synchronization service for large distributed systems [2].
Kafka is similar in some ways to RabbitMQ and other messaging systems in a cense that:
- It brokers messages that are organized into topics
- Producers push messages
- Consumers pull messages
- Kafka runs in a cluster where all nodes are called brokers
In this tutorial I'll install and configure Kafka and Zookeeper on 3 servers. Zookeeper maintains quorum so you'll need at least 3 servers, or n+1 where n is an even number. I'll be using 3 OpenVZ containers but that's irrelevant. The process is pretty straightforward:
Download Zookeeper and Kafka on all three servers:
Install Zookeeper:
Here's an example config file to get you started, just replace the IP's of your servers:
Install Kafka:
Example config file, I've noted the changes required:
Create the zookeeper unique identifiers on all the nodes:
Start zookeeper first:
Then start kafka:
Your cluster is now up and running and ready to accept messages.
Create a new topic with a replication factor of three:
Describe the replicated topic:
Publish a few messages to the new replicated topic:
Consume the messages:
To test a cluster failover just kill zookeeper and kafka on one of the servers and you should still be able to consume the messages.
There are few important things to note about Kafka at the time of this post:
Kafka is not suited for a multi tenant environments as there's no security features - no encryption, authorization or authentication. To achieve a tenant isolation there needs to be some lower level implementation like iptables etc.
Kafka is not an end-user solution, customers need to write custom code for it.
Kafka does not have many ready-made producers and consumers.