Installing 3-node Zookeeper + 3-node Kafka cluster on Ubuntu instances in AWS
Build a Zookeeper cluster
Log into your AWS account
Create a m3.medium ubuntu 64-bit instances with a 256GB EBS instance
Create a security group allowing you some TCP ports - ssh (22) and a zookeeper port (2181)
Wait for the instance to boot up and log into it
Set up your disk space - since it’s an EBS instance, it’s not attached yet. See: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 8G 0 disk
└─xvda1 202:1 0 8G 0 part /
xvdb 202:16 0 256G 0 disk
As you can see, we have to attach /dev/xvdb. Since it’s new, we’ll also have to put an ext4 filesystem on it, make a data directory and then mount the EBS volume
$ sudo mkfs -t ext4 /dev/xvdb
$ sudo mkdir /data
$ sudo mount /dev/xvdb /data
If all went well, running df -h
should show ~250 GB available on /data
Edit /etc/fstab to add the following line for /dev/xvdb:
$ sudo vi /etc/fstab
/dev/xvdb /data ext4 defaults,nofail,nobootwait 0 2
Double check (if you don’t see any output from this command, your fstab is ok)
$ sudo mount -a
Fix your ~/.bash_profile
- add this to the first line
if [ -f ~/.bashrc ]; then . ~/.bashrc; fi
Now install Zookeeper. We’ll try the easy way first and see what we get:
$ sudo apt-get install zookeeper zookeeper-bin
The Ubuntu zookeeper package puts a configuration file at /etc/zookeeper/conf/zoo.cfg, we need to edit it to set the data dir before starting up zookeeper.
$ sudo vi /etc/zookeeper/conf/zoo.cfg
Change the data dir line to dataDir=/data/zookeeper
and save it (ignore the replicated part, we’ll come back to that once we’re sure this instance works)
$ cd /usr/share/zookeeper && sudo bin/zkServer.sh start
In preparation for setting up the cluster, go ahead and create the ‘myid’ file that Zookeeper uses in the data directory you specified in the conf. NOTE: this must be a single integer and increment on each host (i.e. 1 is the first, 2 is the second, 3 is the third host in the cluster)
$ sudo sh -c ‘echo “1” > /etc/zookeeper/conf/myid’
Now, repeat for two more servers (so that we have 3 running Zookeeper).
Important
: you will probably want/need to edit your AWS security group to allow the instances to communicate amongst themselves. I just opened up TCP ports 0 - 3888 to the security group, but you might want to lock that down further. (if you have a security group sg-12345, you can put that in the IP address field instead of specifying an actual IP address or range).
Once you have all three instances running, stop the Zookeeper on each. It’s time to make the cluster.
Edit your zookeeper config again, ON EACH INSTANCE, this time uncommenting the section:
server.1=your-ec2-host-1.compute.amazonaws.com:2888:3888
server.2=your-ec2-host-2.compute.amazonaws.com:2888:3888
server.3=your-ec2-host-3.compute.amazonaws.com:2888:3888
Start up each of the instances again.
Once they’re all up, test them out. Fire up the CLI bundled with Zookeeper.
bin/zkCli.sh -server 127.0.0.1:2181
Run these commands on the first instance (and you should see similar output)
[zk: 127.0.0.1:2181(CONNECTED) 0]
ls /
[zookeeper]
[zk: 127.0.0.1:2181(CONNECTED) 1]create /foo mydata
Created /foo
[zk: 127.0.0.1:2181(CONNECTED) 2]ls /
[foo, zookeeper]
Now start up the CLI on the other instances and run ls /
- you should see foo
in the results. Replication works, woohooo!
Build the Kafka cluster
Similar to the Zookeeper instances, but a little beefier, I created three m3.large instances with 256GB of SSD EBS. Same process for mounting the EBS, etc.
Install java
sudo apt-get install -y openjdk-7-jdk
Download Kafka from a mirror and install it
$ curl -O http://supergsego.com/apache/kafka/0.8.1.1/kafka_2.8.0-0.8.1.1.tgz
$ sudo mkdir -p /usr/local/kafka && sudo mv kafka_2.8.0-0.8.1.1.tgz /usr/local/kafka/ && cd /usr/local/kafka/
$ sudo tar -xvf kafka_2.8.0-0.8.1.1.tgz
$ cd kafka_2.8.0-0.8.1.1
Edit the server.properties file in config/server.properties
* change broker.id
to an integer (just like with myid
in zookeeper) - e.g. broker.id=1 for the first instance, broker.id=2 for the second, etc
* change the zookeeper.connect
property to a comma-separated list of the zookeeper instances and ports (e.g. zookeeper.connect=your-zks-host-1.compute.amazonaws.com:2181,your-zks-host-2.compute.amazonaws.com:2181,your-zks-host-3.compute.amazonaws.com:2181
)
* change the advertised.host.name
to the PUBLIC dns entry for your AWS host - this is super important or (depending on your language/library) you’ll get cryptic failure to resolve DNS/connect errors
NOTE: You’ll probably want to store that zookeeper string somewhere for convenience
Make your life easier, add some properties to your ~/.bash_profile
export KAFKA_HOME=/usr/local/kafka/kafka_2.8.0-0.8.1.1
export PATH=$ZK_HOME/bin:$KAFKA_HOME/bin:$PATH
export ZKS=your-zks-host-1.compute.amazonaws.com:2181,your-zks-host-2.compute.amazonaws.com:2181,your-zks-host-3.compute.amazonaws.com:2181
export KAFKA_NODES=your-kafka-host-1.compute.amazonaws.com:9092,your-kafka-host-2.compute.amazonaws.com:9092,your-kafka-host-3.compute.amazonaws.com:9092
Make sure to source the file so that your changes take effect in this session
$ source ~/.bash_profile
Start up all the instances individually (watch for errors!)
$ sudo $KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties &
Once they’re all up, test it out
$ $KAFKA_HOME/bin/kafka-topics.sh –list –zookeeper $ZKS
$ $KAFKA_HOME/bin/kafka-topics.sh –create –zookeeper $ZKS –replication-factor 2 –partitions 2 –topic test-topic-1
# check it again
$ $KAFKA_HOME/bin/kafka-topics.sh –list –zookeeper $ZKS
Kafka provides some handy scripts in the bin/ directory, make use of them =)
# run this on host 2 (just to keep things interesting)
$KAFKA_HOME/bin/kafka-console-producer.sh –topic test-topic-1 –broker-list $KAFKA_NODES# run this on hosts 1 and 3
$ $KAFKA_HOME/bin/kafka-console-consumer.sh –topic=test-topic-1 –zookeeper=$ZKSAnything you type in the console on host 2 should be echoed on hosts 1 and 3
No comments:
Post a Comment