Monday, December 15, 2014

Integration tests in golang

In an ideal world, I want to run `go test` to run unit tests (which should be fast, not rely on external resources/network/etc) and add something to the command to specify that I want to run integration tests. You can definitely do it with ENV variables and/or some flags inside the test package, but what I've settled on for now is to use build tags and separate out my integration tests.

Create a file 'integration_test.go' and add the following build tag to the file before the package (note: you must put a blank line after the build tag to distinguish it from a comment):

// +build integration

Then just write your tests as usual.

When it comes time to run them, you still use `go test` for your unit tests but now you can type `go test -tags=integration` to only run your integration tests.

Wednesday, December 3, 2014

3node-kafka

Installing 3-node Zookeeper + 3-node Kafka cluster on Ubuntu instances in AWS

Build a Zookeeper cluster

Log into your AWS account
Create a m3.medium ubuntu 64-bit instances with a 256GB EBS instance
Create a security group allowing you some TCP ports - ssh (22) and a zookeeper port (2181)
Wait for the instance to boot up and log into it
Set up your disk space - since it’s an EBS instance, it’s not attached yet. See: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html

$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 8G 0 disk
└─xvda1 202:1 0 8G 0 part /
xvdb 202:16 0 256G 0 disk

As you can see, we have to attach /dev/xvdb. Since it’s new, we’ll also have to put an ext4 filesystem on it, make a data directory and then mount the EBS volume

$ sudo mkfs -t ext4 /dev/xvdb
$ sudo mkdir /data
$ sudo mount /dev/xvdb /data

If all went well, running df -h should show ~250 GB available on /data
Edit /etc/fstab to add the following line for /dev/xvdb:

$ sudo vi /etc/fstab
/dev/xvdb /data ext4 defaults,nofail,nobootwait 0 2

Double check (if you don’t see any output from this command, your fstab is ok)

$ sudo mount -a

Fix your ~/.bash_profile - add this to the first line

if [ -f ~/.bashrc ]; then . ~/.bashrc; fi

Now install Zookeeper. We’ll try the easy way first and see what we get:

$ sudo apt-get install zookeeper zookeeper-bin

The Ubuntu zookeeper package puts a configuration file at /etc/zookeeper/conf/zoo.cfg, we need to edit it to set the data dir before starting up zookeeper.

$ sudo vi /etc/zookeeper/conf/zoo.cfg

Change the data dir line to dataDir=/data/zookeeper and save it (ignore the replicated part, we’ll come back to that once we’re sure this instance works)

$ cd /usr/share/zookeeper && sudo bin/zkServer.sh start

In preparation for setting up the cluster, go ahead and create the ‘myid’ file that Zookeeper uses in the data directory you specified in the conf. NOTE: this must be a single integer and increment on each host (i.e. 1 is the first, 2 is the second, 3 is the third host in the cluster)

$ sudo sh -c ‘echo “1” > /etc/zookeeper/conf/myid’

Now, repeat for two more servers (so that we have 3 running Zookeeper).

Important: you will probably want/need to edit your AWS security group to allow the instances to communicate amongst themselves. I just opened up TCP ports 0 - 3888 to the security group, but you might want to lock that down further. (if you have a security group sg-12345, you can put that in the IP address field instead of specifying an actual IP address or range).

Once you have all three instances running, stop the Zookeeper on each. It’s time to make the cluster.

Edit your zookeeper config again, ON EACH INSTANCE, this time uncommenting the section:

server.1=your-ec2-host-1.compute.amazonaws.com:2888:3888
server.2=your-ec2-host-2.compute.amazonaws.com:2888:3888
server.3=your-ec2-host-3.compute.amazonaws.com:2888:3888

Start up each of the instances again.

Once they’re all up, test them out. Fire up the CLI bundled with Zookeeper.

bin/zkCli.sh -server 127.0.0.1:2181

Run these commands on the first instance (and you should see similar output)

[zk: 127.0.0.1:2181(CONNECTED) 0] ls /
[zookeeper]
[zk: 127.0.0.1:2181(CONNECTED) 1] create /foo mydata
Created /foo
[zk: 127.0.0.1:2181(CONNECTED) 2] ls /
[foo, zookeeper]

Now start up the CLI on the other instances and run ls / - you should see foo in the results. Replication works, woohooo!

Build the Kafka cluster

Similar to the Zookeeper instances, but a little beefier, I created three m3.large instances with 256GB of SSD EBS. Same process for mounting the EBS, etc.

Install java

sudo apt-get install -y openjdk-7-jdk

Download Kafka from a mirror and install it

$ curl -O http://supergsego.com/apache/kafka/0.8.1.1/kafka_2.8.0-0.8.1.1.tgz
$ sudo mkdir -p /usr/local/kafka && sudo mv kafka_2.8.0-0.8.1.1.tgz /usr/local/kafka/ && cd /usr/local/kafka/
$ sudo tar -xvf kafka_2.8.0-0.8.1.1.tgz
$ cd kafka_2.8.0-0.8.1.1

Edit the server.properties file in config/server.properties
* change broker.id to an integer (just like with myid in zookeeper) - e.g. broker.id=1 for the first instance, broker.id=2 for the second, etc
* change the zookeeper.connect property to a comma-separated list of the zookeeper instances and ports (e.g. zookeeper.connect=your-zks-host-1.compute.amazonaws.com:2181,your-zks-host-2.compute.amazonaws.com:2181,your-zks-host-3.compute.amazonaws.com:2181)
* change the advertised.host.name to the PUBLIC dns entry for your AWS host - this is super important or (depending on your language/library) you’ll get cryptic failure to resolve DNS/connect errors

NOTE: You’ll probably want to store that zookeeper string somewhere for convenience

Make your life easier, add some properties to your ~/.bash_profile

export KAFKA_HOME=/usr/local/kafka/kafka_2.8.0-0.8.1.1
export PATH=$ZK_HOME/bin:$KAFKA_HOME/bin:$PATH
export ZKS=your-zks-host-1.compute.amazonaws.com:2181,your-zks-host-2.compute.amazonaws.com:2181,your-zks-host-3.compute.amazonaws.com:2181
export KAFKA_NODES=your-kafka-host-1.compute.amazonaws.com:9092,your-kafka-host-2.compute.amazonaws.com:9092,your-kafka-host-3.compute.amazonaws.com:9092

Make sure to source the file so that your changes take effect in this session

$ source ~/.bash_profile

Start up all the instances individually (watch for errors!)

$ sudo $KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties &

Once they’re all up, test it out

$ $KAFKA_HOME/bin/kafka-topics.sh –list –zookeeper $ZKS
$ $KAFKA_HOME/bin/kafka-topics.sh –create –zookeeper $ZKS –replication-factor 2 –partitions 2 –topic test-topic-1
# check it again
$ $KAFKA_HOME/bin/kafka-topics.sh –list –zookeeper $ZKS

Kafka provides some handy scripts in the bin/ directory, make use of them =)

# run this on host 2 (just to keep things interesting)
$KAFKA_HOME/bin/kafka-console-producer.sh –topic test-topic-1 –broker-list $KAFKA_NODES

# run this on hosts 1 and 3
$ $KAFKA_HOME/bin/kafka-console-consumer.sh –topic=test-topic-1 –zookeeper=$ZKS

Anything you type in the console on host 2 should be echoed on hosts 1 and 3

success, you have a working 3-node kafka cluster.

Saturday, November 29, 2014

Decrypting something encrypted with OpenSSL passphrase in Golang

For various reasons, it's easiest to deploy configuration settings at our company as java properties files. Java properties files usually contain lines in the form of {key}={value}, with a few quirks around encoding, but nothing too difficult to handle.

My app is written in Go, so it's a little strange to use java-style properties, but they're easy enough to parse and load into a map[string]string.

Things get a little more interesting when those keys and/or values are sometimes encrypted with OpenSSL using AES-256-CBC and a passphrase.

Golang already has quite a few packages available in crypto including AES, so I thought this should be straightforward, especially since the password to decrypt would be provided by the user.

First question is regarding aes.NewCipher - it takes a []byte, the length of which is used to determine if you use AES-128, AES-192 or AES-256. I know the properties were encrypted using AES-256-CBC, therefore I need a 32 byte key, but all I have is a password. You can easily get the key, salt and initialization vector (iv) by using the OpenSSL command-line tool, but it took quite a bit of digging to figure out how to do that programmatically.

# Using the openssl CLI
$ echo "this is a test" | /usr/bin/openssl enc -e -aes-256-cbc -pass pass:password -base64
U2FsdGVkX1+ywYxveBnekSnx6ZP25nyPsWHS3oqcuTo=

$ echo "U2FsdGVkX1+ywYxveBnekSnx6ZP25nyPsWHS3oqcuTo=" | /usr/bin/openssl enc -d -aes-256-cbc -pass pass:password -base64 -p
salt=B2C18C6F7819DE91
key=98350CE8088E5A4F4E8B31830D20E20DAFD7881970E1487837FDA01F86B9166C
iv =2436176AB793A101FC7427DB662B84A2

$ echo "U2FsdGVkX1+ywYxveBnekSnx6ZP25nyPsWHS3oqcuTo=" | /usr/bin/openssl enc -d -aes-256-cbc -pass pass:password -base64
this is a test

A couple posts/stackoverflow questions point to EVP_BytesToKey which sorta implies that you just need to MD5 the passphrase and salt until you have enough bytes to make up the key and iv.

The next question, "how do I get the salt?", turned out to be super easy. In the OpenSSL implementation, you just have to turn the encrypted base64 string into a []byte and grab the first 16 bytes. Those bytes are always the string "Salted_" followed by the actual 8 bytes of salt.

Now that we have the passphrase and salt, we just have to repeatedly generate MD5 sums from the previous MD5 sum (start with an empty []byte for the first iteration) concatenated with the passphrase concatenated with the salt. Each iteration gives you 16 bytes so for the 32 byte AES-256 key plus the 16 byte iv it needs to be run it three times.

After you have the key and iv, it's easy to decrypt using Go - the tricky part was figuring out how to go from a passphrase to the key and iv.

Anyway, here's the final code, hope it's helpful to someone: http://play.golang.org/p/r3VObSIB4o

Wednesday, November 5, 2014

Set complement between two files

Set Complement (or what elements from file1 are not in file2)

$ sort file2 file2 file1 | uniq -u

Monday, October 20, 2014

Serving up static files from Go / Gorilla mux

I've been playing with Go lately, just serving up a non-trivial API to kick the tires a bit.

One of the things I wanted to do was use the Swagger (https://helloreverb.com/developers/swagger) framework to document that API (I've done this before in Ruby with grape and it was a breeze). (BTW - it was considerably more tedious to write the specification (even using the awesome https://github.com/wordnik/swagger-editor) but that's a different post...)

I needed to host the swagger-ui files (which they helpfully bundle into a 'dist' folder) and serve them up from my Go application. The directory structure looked like this:

├── server.go

├── swagger-ui/

│ ├── css/

│ ├── images/

│ ├── index.html

│ ├── lib/

It ended up being fairly trivial to serve static files but took a little experimentation to get the Gorilla mux piece working correctly. Here's what I ended up doing, inside 'server.go'

func main() {
router = mux.NewRouter()
router.HandleFunc("/", indexHandler)
router.HandleFunc("/api-docs", apiDocHandler)
router.PathPrefix("/swagger-ui/").Handler(http.StripPrefix("/swagger-ui/", http.FileServer(http.Dir("./swagger-ui/"))))
http.Handle("/", router)
err := http.ListenAndServeTLS(":8443", "cert.pem", "key.pem", router)
if err != nil {
log.Fatalln(err)
}
}

Important parts to realize:
1) you should use the PathPrefix so that the mux picks up everything inside that folder
2) you need to strip out the path prefix so that your files are served up from the correct directory, otherwise you'll end up making requests for /swagger-ui/swagger-ui/ which won't work.

Tuesday, June 3, 2014

Docker and Google Compute Engine

Participated in a hackathon (hackforchange.org) this past weekend and there were some Google credits available so I figured I'd see how easy it was to get Docker running on Compute Engine. Turns out, very easy thanks to this post from Google: Containers on Google Cloud Platform.
The only real gotcha that I ran into was that I ran out of disk space almost immediately after installing Docker registry. What?! Yup, since Compute Engine starts you out with an 8GB disk you need to:

Provision a bigger disk and attach it
Move Docker onto the bigger disk

Attaching more disk is thoroughly documented by Google (see: Disks) but it took quite a bit of searching to figure out how to move Docker so I figured I'd post with what I did.

We're going to assume I added a bigger disk and mounted it to /docker-storage.

Stop docker - 'service docker stop'
Edit /etc/default/docker - you're looking for the line that sets DOCKER_OPTS. You want to add a -g flag and where you want Docker to run from. In my example, I prepended
```
-g=/docker-storage/docker
```
Move the files from the default Docker location (/var/lib/docker) to the new mount: 'cd /var/lib; mv docker /docker-storage/'
Restart docker - 'service docker start'

That should be all it takes.

Friday, May 2, 2014

Quick tidbits

One of the reasons I like Project Euler is that it gives you a chance to expand your knowledge of whatever language you're using...

For instance, today I learned about Fixnum#fdiv (forces float division instead of integer division) in Ruby:

2/4       # 0
2.div(4)  # 0
2.fdiv(4) # 0.5

Sure there's a ton of other ways to do it, but this was succinct and to the point.

And the RSpec =~ operator for array equality - it's basically the same thing as ==, except that order doesn't matter.

[1,2] == [2,1] # false
[1,2] =~ [2,1] # true

Two simple things that I'd not had a reason to use before. Will I use them all the time, nope. But I'm glad to have found them.

Monday, January 27, 2014

Logging sql queries from korma

Wanted to see what sql is actually getting executed from a webapp running korma - seems fairly easy since I stumbled upon blackwater.
Inside your project.clj, add blackwater and clj-time...

(defproject foo "0.1.0-SNAPSHOT" 
  :description "Zap bug tracker"
  :dependencies [
    ;; other deps omitted
    [blackwater "0.0.9"]
    [clj-time "0.6.0"]]

Inside your ns:

(ns.foo
  (:use korma.db korma.core)
  (:require [black.water.korma :refer [decorate-korma!]]))

(decorate-korma!)

You'll get nice output like:

SELECT "project".* FROM "project"| took:31ms
SELECT "project".* FROM "project" WHERE ("project"."id" = ?)| took:4ms

Reloading lein repl

So, I'm working through the book Seven Web Frameworks in Seven Weeks from the awesome Pragmatic Programmers and I'm on the Clojure section (keep in mind that I don't know Clojure ^^) but quitting/restarting to get the repl to pick up changes just plain sucks.

Here's a solution I found on SO:

Add tools.namespace to your project.clj dependencies.

e.g.

:dependencies [
  # ...
  [org.clojure/tools.namespace "0.2.4"]]

After you start up the lein repl:

user=> (use '[clojure.tools.namespace.repl :only (refresh)])
nil

... make some changes to a clj file ...

user=> (refresh)
:reloading (hello.core hello.core-test zap.models)
:ok

It's not 100% automatic, but it beats quitting/restarting. Though it does appears you have to type that use statement every time after you refresh...

dequeue