Getting Started with AWS Managed Streaming for Kafka Service (MSK)

Matt Houghton
2 min readNov 29, 2018

--

A very welcome announcement at AWS re:Invent today https://aws.amazon.com/msk/

Here are my notes on getting it up and running.

The docs are pretty good at taking you through this. A couple of tips however. First you will need to update the AWS CLI on your EC2 machine via pip as at the time of writing the latest Amazon Linux AMI did not have the version that has the “aws kafka” command. Secondly my attempt at using the AWS console to create the Kafka cluster did not work out. It created a cluster that had no security group associated with it so my Kafka client from EC2 would not connect. Check the AWS MSK forums (https://forums.aws.amazon.com/forum.jspa?forumID=315) for posts from others on this point. The method in the docs using the CLI does not have this problem as you specify the security group in the JSON payload.

Here is my JSON file.

{"InstanceType": "kafka.m5.large","ClientSubnets": ["subnet-axxxxx","subnet-2xxxx","subnet-4xxxx"],"SecurityGroups": ["sg-05xxxxx"]}

For my security group I allowed ingress for All Traffic from within my VPC. Once the cluster is running you could lock this down further if you wished by getting the IP’s of your broker/zookeeper nodes using “aws kafka describe-cluster” and “aws kafka get-bootstrap-brokers” commands.

Create the cluster by running the command “aws kafka create-cluster — cluster-name “matt-msk-demo” — broker-node-group-info file://brokernodegroupinfo.json — kafka-version “1.1.1” — number-of-broker-nodes 3 — enhanced-monitoring PER_TOPIC_PER_BROKER”

You will get back the ARN for the cluster that is being created. It will take a while. The docs say 15–30 minutes. I did not time it.

You can check on progress by running the command aws kafka describe-cluster — cluster-arn “yourARN”

Once your cluster has gone to a state of created. Login to an EC2 instance. I assigned the same security group that was used for the Kafka cluster.

You need to have JAVA 1.8 and Kafka installed on the EC2.

sudo yum install java-1.8.0
wget ftp://apache.mirrors.tds.net/pub/apache.org/kafka/1.1.1/kafka_2.11-1.1.1.tgz
tar -xzf kafka_2.11–1.1.1.tgz

For the next steps (topic creation and message produce/consume) get the broker and zookeper node IP’s

aws kafka get-bootstrap-brokers — cluster-arn “yourARN” this will return the “BootstrapBrokerString” you need.

aws kafka describe-cluster — cluster-arn “yourARN” this will return the “ZookeeperConnectString” you need.

Create a topic.

cd kafka_2.11–1.1.1/
[ec2-user]$ bin/kafka-topics.sh — create — zookeeper “ZookeeperConnectString” — replication-factor 3 — partitions 1 — topic mskiscool
Created topic “mskiscool”.

Produce some messages

[ec2-user]$ bin/kafka-console-producer.sh — broker-list “BootstrapBrokerString” — topic mskiscool
>hello world
>msk demo
>cdl enjoy
(Control C)

Consume the messages.

[ec2-user]$ bin/kafka-console-consumer.sh — bootstrap-server “BootstrapBrokerString” — topic mskiscool — from-beginning
hello world
msk demo
cdl enjoy
(Control C)
Processed a total of 3 messages

--

--

Matt Houghton

Data Architect @CDL_Software , AWS Community Builder, 13 x AWS Certified. Qlik Global Luminary 50.