Why is everybody talking about Kafka?

Getting started with Apache Kafka

Posted by Javier Tomas Zon on October 25, 2018

What is Apache Kafka

From Wikipedia

Apache Kafka is an open-source stream-processing software platform developed by the Apache Software Foundation, written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a “massively scalable pub/sub message queue designed as a distributed transaction log” making it highly valuable for enterprise infrastructures to process streaming data. Additionally, Kafka connects to external systems (for data import/export) via Kafka Connect and provides Kafka Streams, a Java stream processing library.

Read more about Apache Kafka

Let’s try to explain in plain English

So Apache Kafka, allows you to decouple your data streams and your systems.
So now your source systems will have their data end up in Apache Kafka. While your target systems will source their data straight from Apache Kafka.

So for example, what do we have in Kafka? Well you can have any data stream you can think about. For example, it can be website events, pricing data, financial transactions, user interactions, and many more.

Additionally, once the data is in Kafka, you may want to put it into any system you like, such as a data base, your analytics systems, your email system, or your audits.

Quick Start

This tutorial assumes you are starting fresh and have no existing Kafka or ZooKeeper data. Since Kafka console scripts are different for Unix-based and Windows platforms, on Windows platforms use bin\windows\ instead of bin/, and change the script extension to .bat.

Step 1: Download the code

Download the 2.0.0 release and un-tar it.

tar -xzf kafka_2.11-2.0.0.tgz
cd kafka_2.11-2.0.0

Step 2: Start the server

Kafka uses ZooKeeper so you need to first start a ZooKeeper server if you don’t already have one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node ZooKeeper instance.

bin/zookeeper-server-start.sh config/zookeeper.properties
[2013-04-22 15:01:37,495] INFO Reading configuration from: config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)

Now start the Kafka server:

bin/kafka-server-start.sh config/server.properties
[2013-04-22 15:01:47,028] INFO Verifying properties (kafka.utils.VerifiableProperties)
[2013-04-22 15:01:47,051] INFO Property socket.send.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties)

Step 3: Create a topic

Let’s create a topic named “test” with a single partition and only one replica:

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

We can now see that topic if we run the list topic command:

bin/kafka-topics.sh --list --zookeeper localhost:2181
test

Alternatively, instead of manually creating topics you can also configure your brokers to auto-create topics when a non-existent topic is published to.

Step 4: Send some messages

Kafka comes with a command line client that will take input from a file or from standard input and send it out as messages to the Kafka cluster. By default, each line will be sent as a separate message.

Run the producer and then type a few messages into the console to send to the server.

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
This is a message
This is another message

Step 5: Start a consumer

Kafka also has a command line consumer that will dump out messages to standard output.

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
This is a message
This is another message

If you have each of the above commands running in a different terminal then you should now be able to type messages into the producer terminal and see them appear in the consumer terminal. All of the command line tools have additional options; running the command with no arguments will display usage information documenting them in more detail.

In the next chapters we will introduce Kafka Cluster concepts, high availability and much more, keep in touch! Keep this image in mind to think where you can apply Kafka inside your architecture.


Want to Automate your database backups and manage your servers in a Web interface?
Contact Us Here and request access to our tool !


Tags

  • Binlogic
  • CloudBackup
  • AWS
  • kafka
  • backup
  • zookeeper