Wednesday, December 9, 2020

Hybrid Kubernetes cluster (arm/x86)

This will be a long post (or maybe multiple posts). The end result will be a 7 node Kubernetes cluster capable of running both x86 and arm64 workloads. Hardware * 2 Odroid H2+ nodes (one master and one worker) * 5 Raspberry Pi 4 nodes (all workers) * network switch * misc (poweradapters / cat6e cables / etc) Software Stack Ubuntu 20.04 Kubernetes 1.19 CRI-O First up, the master node. These Odroid H2+ SBCs are pretty awesome (TODO link to specs). They include two Realtek 2.5gbe ethernet ports, but one minor drawback is that you need to install the drivers before they work making the install a little trickier. Odroid has a good wiki page dedicated to this issue (https://wiki.odroid.com/odroid-h2/application_note/install_ethernet_driver_on_h2plus), but the absolute easiest thing to do is to share your phones internet connection via USB (it will be picked up automatically). Since I am going to install two nodes (at least), I figured I'd do something a little better. Enter CUBIC (Custom Ubuntu ISO Creator - https://launchpad.net/cubic), a GUI wizard to create a customized Ubuntu Live ISO image. There are lots of tutorials on the internet explaining how to use cubic, so I won't go into details, but basically we need to download the Ubuntu 20.04 ISO and open it up in cubic. One of the steps in the cubic wizard will drop you into a shell, at which point you'll want to follow the instructions on the Odroid wiki page above to add the hardkernel PPA and install the drivers (realtek-r8125-dkms) into the image. At this point, you might as well add CRI-O (following the instructions on their site):
export OS=xUbuntu_20.04
export VERSION=1.19
echo "deb https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/ /" > /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list
echo "deb http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$VERSION/$OS/ /" > /etc/apt/sources.list.d/devel:kubic:libcontainers:stable:cri-o:$VERSION.list

curl -L https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable:cri-o:$VERSION/$OS/Release.key | apt-key add -
curl -L https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/Release.key | apt-key add -

apt-get update
apt-get install cri-o cri-o-runc
Finish up the wizard, and create the new Ubuntu live image. Burn that image (balenaEtcher works well for this) onto a USB thumbdrive, and pop it into the first H2+ node. Install Ubuntu, picking whatever options you want... (I'm trying out ZFS, which is still experimental in 20.04). Since we already installed the drivers, you should have the internet available, as long as you have connected it to something with a DHCP server. Update to the latest software, etc. # Configuring and Running CRI-O First, kubernetes will require a few things for cri-o to work: (as root) run the following
modprobe overlay
modprobe br_netfilter

cat > /etc/sysctl.d/99-kubernetes-cri.conf <

Important! Also add overlay and br_netfilter to /etc/modules-load.d/modules.conf so that it's permanent.  I did not originally, rebooted, and then wondered why I was getting "/proc/sys/net/bridge/bridge-nf-call-iptables not found" errors when trying to run kubeadm!

Next up, install the CRI tool `crictl`

export VERSION="v1.19.0"
wget https://github.com/kubernetes-sigs/cri-tools/releases/download/$VERSION/crictl-$VERSION-linux-amd64.tar.gz
sudo tar zxvf crictl-$VERSION-linux-amd64.tar.gz -C /usr/local/bin
If you try it now, you'll see that CRI-O isn't running...
crictl info
FATA[0002] connect: connect endpoint 'unix:///var/run/crio/crio.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded 
So fix that:
sudo systemctl start crio
sudo systemctl enable crio
CRI-O will be running at this point, but it needs a CNI (container networking interface)
sudo crictl info
{
  "status": {
    "conditions": [
      {
        "type": "RuntimeReady",
        "status": true,
        "reason": "",
        "message": ""
      },
      {
        "type": "NetworkReady",
        "status": false,
        "reason": "NetworkPluginNotReady",
        "message": "Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?"
      }
    ]
  }
}
We're going to get that when we install kubernetes, so let's skip ahead... But first, let's install OpenSSH server so that we can just complete the rest of these tasks
sudo apt install openssh-server
# Installing kubernetes Note: at the time of writing this post, kubernetes 1.20 had just been released. I will upgrade to that at some point in the future, but for now, I want 1.19 so that everything matches up (e.g. cri-o). You can run `apt list -a kubeadm` to see what's available, 1.19.4-00 was the latest in the 1.19 branch right now.
sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

cat <


Make sure swap is turned off

 sudo swapoff -a 
Here is where we diverge based on whether we're creating the master node or a worker... IF MASTER... Take a look at the kubadm defaults `kubeadm config print init-defaults`
W1209 08:04:47.908176   66064 kubelet.go:200] cannot automatically set CgroupDriver when starting the Kubelet: cannot execute 'docker info -f {{.CgroupDriver}}': executable file not found in $PATH
W1209 08:04:47.914979   66064 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 1.2.3.4
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: kmaster
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}c
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.19.0
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
scheduler: {}
We need to do some configuration before running `kubeadm`, see: https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/#config-file Create a file: /etc/default/kubelet Paste in: KUBELET_EXTRA_ARGS=--feature-gates="AllAlpha=false,RunAsGroup=true" --container-runtime=remote --cgroup-driver=systemd --container-runtime-endpoint='unix:///var/run/crio/crio.sock' --runtime-request-timeout=5m Figure out what CIDR your CNI is using... cat /etc/cni/net.d/100-crio-bridge.conf (it's probably 10.85.0.0/16). Note, you cannot pass both --config and --pod-network-cidr as suggested by some other tutorials. You must use a ClusterConfiguration in the --config file (as below) Pass in the cgroup driver through an init file you will use with kubeadm (I used kubeadm-init.yaml)
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
networking:
  podSubnet: "10.85.0.0/16"

sudo kubeadmn --config=kubeadm-init.yaml IF WORKER... Don't forget to restart the kubelet.
systemctl daemon-reload && systemctl restart kubelet
We need to join to the existing cluster. Go to the master node and run:
kubeadm token create --print-join-command
Then take the output of that command and run it on the worker node. DNS!! Another thing that got me... I couldn't get external hostnames to resolve on the ubuntu hosts Turns out there were no dns servers listed. Edit /etc/systemd/resolved.conf on all nodes Uncomment the FallbackDNS line and set it to your favorite DNS resolver (e.g. 1.1.1.1 or 8.8.8.8) Restart the service:
service systemd-resolved restart
Great, now the external names resolve on the nodes. But the pods themselves cannot resolve external names! You may need to restart coreDNS to pick up the changes also.
kubectl rollout restart -n kube-system deployment coredns