Monday, December 15, 2014

Integration tests in golang

In an ideal world, I want to run `go test` to run unit tests (which should be fast, not rely on external resources/network/etc) and add something to the command to specify that I want to run integration tests. You can definitely do it with ENV variables and/or some flags inside the test package, but what I've settled on for now is to use build tags and separate out my integration tests.

Create a file 'integration_test.go' and add the following build tag to the file before the package (note: you must put a blank line after the build tag to distinguish it from a comment):

// +build integration
Then just write your tests as usual.

When it comes time to run them, you still use `go test` for your unit tests but now you can type `go test -tags=integration` to only run your integration tests.

Wednesday, December 3, 2014

3node-kafka

Installing 3-node Zookeeper + 3-node Kafka cluster on Ubuntu instances in AWS

Build a Zookeeper cluster

Log into your AWS account
Create a m3.medium ubuntu 64-bit instances with a 256GB EBS instance
Create a security group allowing you some TCP ports - ssh (22) and a zookeeper port (2181)
Wait for the instance to boot up and log into it
Set up your disk space - since it’s an EBS instance, it’s not attached yet. See: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html

$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 8G 0 disk
└─xvda1 202:1 0 8G 0 part /
xvdb 202:16 0 256G 0 disk

As you can see, we have to attach /dev/xvdb. Since it’s new, we’ll also have to put an ext4 filesystem on it, make a data directory and then mount the EBS volume

$ sudo mkfs -t ext4 /dev/xvdb
$ sudo mkdir /data
$ sudo mount /dev/xvdb /data

If all went well, running df -h should show ~250 GB available on /data
Edit /etc/fstab to add the following line for /dev/xvdb:

$ sudo vi /etc/fstab
/dev/xvdb /data ext4 defaults,nofail,nobootwait 0 2

Double check (if you don’t see any output from this command, your fstab is ok)

$ sudo mount -a

Fix your ~/.bash_profile - add this to the first line

if [ -f ~/.bashrc ]; then . ~/.bashrc; fi

Now install Zookeeper. We’ll try the easy way first and see what we get:

$ sudo apt-get install zookeeper zookeeper-bin

The Ubuntu zookeeper package puts a configuration file at /etc/zookeeper/conf/zoo.cfg, we need to edit it to set the data dir before starting up zookeeper.

$ sudo vi /etc/zookeeper/conf/zoo.cfg

Change the data dir line to dataDir=/data/zookeeper and save it (ignore the replicated part, we’ll come back to that once we’re sure this instance works)

$ cd /usr/share/zookeeper && sudo bin/zkServer.sh start

In preparation for setting up the cluster, go ahead and create the ‘myid’ file that Zookeeper uses in the data directory you specified in the conf. NOTE: this must be a single integer and increment on each host (i.e. 1 is the first, 2 is the second, 3 is the third host in the cluster)

$ sudo sh -c ‘echo “1” > /etc/zookeeper/conf/myid’

Now, repeat for two more servers (so that we have 3 running Zookeeper).

Important: you will probably want/need to edit your AWS security group to allow the instances to communicate amongst themselves. I just opened up TCP ports 0 - 3888 to the security group, but you might want to lock that down further. (if you have a security group sg-12345, you can put that in the IP address field instead of specifying an actual IP address or range).

Once you have all three instances running, stop the Zookeeper on each. It’s time to make the cluster.

Edit your zookeeper config again, ON EACH INSTANCE, this time uncommenting the section:

server.1=your-ec2-host-1.compute.amazonaws.com:2888:3888
server.2=your-ec2-host-2.compute.amazonaws.com:2888:3888
server.3=your-ec2-host-3.compute.amazonaws.com:2888:3888

Start up each of the instances again.

Once they’re all up, test them out. Fire up the CLI bundled with Zookeeper.

bin/zkCli.sh -server 127.0.0.1:2181

Run these commands on the first instance (and you should see similar output)

[zk: 127.0.0.1:2181(CONNECTED) 0] ls /
[zookeeper]
[zk: 127.0.0.1:2181(CONNECTED) 1] create /foo mydata
Created /foo
[zk: 127.0.0.1:2181(CONNECTED) 2] ls /
[foo, zookeeper]

Now start up the CLI on the other instances and run ls / - you should see foo in the results. Replication works, woohooo!

Build the Kafka cluster

Similar to the Zookeeper instances, but a little beefier, I created three m3.large instances with 256GB of SSD EBS. Same process for mounting the EBS, etc.

Install java

sudo apt-get install -y openjdk-7-jdk

Download Kafka from a mirror and install it

$ curl -O http://supergsego.com/apache/kafka/0.8.1.1/kafka_2.8.0-0.8.1.1.tgz
$ sudo mkdir -p /usr/local/kafka && sudo mv kafka_2.8.0-0.8.1.1.tgz /usr/local/kafka/ && cd /usr/local/kafka/
$ sudo tar -xvf kafka_2.8.0-0.8.1.1.tgz
$ cd kafka_2.8.0-0.8.1.1

Edit the server.properties file in config/server.properties
* change broker.id to an integer (just like with myid in zookeeper) - e.g. broker.id=1 for the first instance, broker.id=2 for the second, etc
* change the zookeeper.connect property to a comma-separated list of the zookeeper instances and ports (e.g. zookeeper.connect=your-zks-host-1.compute.amazonaws.com:2181,your-zks-host-2.compute.amazonaws.com:2181,your-zks-host-3.compute.amazonaws.com:2181)
* change the advertised.host.name to the PUBLIC dns entry for your AWS host - this is super important or (depending on your language/library) you’ll get cryptic failure to resolve DNS/connect errors

NOTE: You’ll probably want to store that zookeeper string somewhere for convenience

Make your life easier, add some properties to your ~/.bash_profile

export KAFKA_HOME=/usr/local/kafka/kafka_2.8.0-0.8.1.1
export PATH=$ZK_HOME/bin:$KAFKA_HOME/bin:$PATH
export ZKS=your-zks-host-1.compute.amazonaws.com:2181,your-zks-host-2.compute.amazonaws.com:2181,your-zks-host-3.compute.amazonaws.com:2181
export KAFKA_NODES=your-kafka-host-1.compute.amazonaws.com:9092,your-kafka-host-2.compute.amazonaws.com:9092,your-kafka-host-3.compute.amazonaws.com:9092

Make sure to source the file so that your changes take effect in this session

$ source ~/.bash_profile

Start up all the instances individually (watch for errors!)

$ sudo $KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties &

Once they’re all up, test it out

$ $KAFKA_HOME/bin/kafka-topics.sh –list –zookeeper $ZKS
$ $KAFKA_HOME/bin/kafka-topics.sh –create –zookeeper $ZKS –replication-factor 2 –partitions 2 –topic test-topic-1
# check it again
$ $KAFKA_HOME/bin/kafka-topics.sh –list –zookeeper $ZKS

Kafka provides some handy scripts in the bin/ directory, make use of them =)

# run this on host 2 (just to keep things interesting)
$KAFKA_HOME/bin/kafka-console-producer.sh –topic test-topic-1 –broker-list $KAFKA_NODES

# run this on hosts 1 and 3
$ $KAFKA_HOME/bin/kafka-console-consumer.sh –topic=test-topic-1 –zookeeper=$ZKS

Anything you type in the console on host 2 should be echoed on hosts 1 and 3

success, you have a working 3-node kafka cluster.