dequeue: kubernetes

Showing posts with label kubernetes. Show all posts

Saturday, March 11, 2023

Broken signatures

What happens if you leave an Ubuntu machine running for a couple years and then try to update? Sometimes signatures expire...

Err:6 http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/1.19/xUbuntu_20.04  InRelease
  The following signatures were invalid: EXPKEYSIG 4D64390375060AA4 devel:kubic OBS Project <devel:kubic@build.opensuse.org>

To resolve: (from https://github.com/containers/podman.io/issues/296#issuecomment-1455207534)

wget -qO - https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_22.04/Release.key | sudo apt-key add -

Err:3 https://packages.cloud.google.com/apt kubernetes-xenial InRelease
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY B53DC80D13EDEF05

To resolve: (from https://github.com/kubernetes/release/issues/1982#issuecomment-1415573798)

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | gpg --dearmor | sudo dd status=none of=/usr/share/keyrings/kubernetes-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt update

Saturday, April 24, 2021

Rough guide to upgrading k8s cluster w/ kubeadm

This is not the best way, just a way that works for me given the cluster topography I have (which was installed using kubeadm on ubuntu, and includes a non-HA etcd running in-cluster). On the control plane / master node: 1) Backup etcd (manually) You might need the info from the etcd pod (`kubectl -n kube-system describe po etcd-master`) to find the various certs/keys/etc, but really they're probably just at /etc/kubernetes/pki/etcd/

kubectl exec -n kube-system etcd-kmaster -- etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --key=/etc/kubernetes/pki/etcd/server.key --cert=/etc/kubernetes/pki/etcd/server.crt snapshot save /var/lib/etcd/snapshot.db

--ignore-daemonsets Backup important files locally (but really, these should also be backed-up on a different server)

mkdir $HOME/backup
sudo cp -r /etc/kubernetes/pki/etcd $HOME/backup/
sudo cp /var/lib/etcd/snapshot.db $HOME/backup/$(date +%Y-%m-%d--%H-%M)-snapshot.db
sudo cp /$HOME/kubeadm-init.yaml $HOME/backup

Figure out what we're going to upgrade to. Do NOT attempt to skip minor versions (i.e. go from 1.19 -> 1.20 -> 1.21, not 1.19 - 1.21)

sudo apt update
sudo apt-cache madison kubeadm
sudo kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.6", GitCommit:"8a62859e515889f07e3e3be6a1080413f17cf2c3", GitTreeState:"clean", BuildDate:"2021-04-15T03:26:21Z", GoVersion:"go1.15.10", Compiler:"gc", Platform:"linux/amd64"}

I'm going to go from 1.19.6-00 to 1.20.6-00 because that's what's currently available (and then from 1.20.6-00 to 1.21.0-00)

Remove the hold on kubeadm, update it, then freeze it again.

sudo apt-mark unhold kubeadm
sudo apt-get install -y kubeadm=1.20.6-00
sudo apt-mark hold kubeadm


Make sure it worked
sudo kubeadm version

kubeadm version: &version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.6", GitCommit:"8a62859e515889f07e3e3be6a1080413f17cf2c3", GitTreeState:"clean", BuildDate:"2021-04-15T03:26:21Z", GoVersion:"go1.15.10", Compiler:"gc", Platform:"linux/amd64"}


Cordon and drain the master node (I've got a pod using local storage, so that extra flag is necessary)

kubectl cordon kmaster
kubectl drain kmaster --ignore-daemonsets --delete-local-data


Check out the upgrade plan.  I get two options, upgrade to latest in the v1.19 series (1.19.10) or upgrade to latest stable version (1.20.6)

sudo kubeadm upgrade plan
sudo kubeadm upgrade apply v1.20.6


Nothing else needed to be upgraded, so I saw
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.20.6". Enjoy!
[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.


It's still going to show 1.19.6, which is expected
kubectl get no
NAME        STATUS                     ROLES                  AGE    VERSION
kmaster     Ready,SchedulingDisabled   control-plane,master   128d   v1.19.6
kworker01   Ready                      none                   125d   v1.19.6


Now to upgrade kubelet and kubectl to the SAME version as kubeadm
sudo apt-mark unhold kubelet kubectl
sudo apt-get install -y  kubelet=1.20.6-00 kubectl=1.20.6-00
sudo apt-mark hold kubelet kubectl
sudo systemctl daemon-reload
sudo systemctl restart kubelet.service


Now we should see the master node running the updated version
kubectl get no
NAME        STATUS                     ROLES                  AGE    VERSION
kmaster     Ready,SchedulingDisabled   control-plane,master   128d   v1.20.6
kworker01   Ready                      none                   125d   v1.19.6


Uncordon it, and make sure it shows 'Ready'

Now drain the worker(s) and then repeat roughly the same process on the worker nodes (and yes, the --force is necessary because I'm running something that isn't set up correctly or playing nicely - I'm looking at you operatorhub)

kubectl drain kworker01 --ignore-daemonsets --delete-local-data --force


On the worker node(s)
sudo apt-mark unhold kubeadm
sudo apt-get install -y kubeadm=1.20.6-00
sudo apt-mark hold kubeadm

sudo kubeadm upgrade node

sudo apt-mark unhold kubelet kubectl
sudo apt-get install -y  kubelet=1.20.6-00 kubectl=1.20.6-00
sudo apt-mark hold kubelet kubectl

sudo systemctl daemon-reload
sudo systemctl restart kubelet.service


Back on the master node, we should be able to get the nodes and see that the worker is upgraded.  Since it is, we can uncordon it, and it should switch to 'Ready'
kubectl get no
NAME        STATUS                     ROLES                  AGE    VERSION
kmaster     Ready                      control-plane,master   128d   v1.20.6
kworker01   Ready                      none                   125d   v1.20.6


That's it!  Rinse and repeat for 1.21 once the entire cluster is on 1.20

Wednesday, December 9, 2020

Hybrid Kubernetes cluster (arm/x86)

This will be a long post (or maybe multiple posts). The end result will be a 7 node Kubernetes cluster capable of running both x86 and arm64 workloads. Hardware * 2 Odroid H2+ nodes (one master and one worker) * 5 Raspberry Pi 4 nodes (all workers) * network switch * misc (poweradapters / cat6e cables / etc) Software Stack Ubuntu 20.04 Kubernetes 1.19 CRI-O First up, the master node. These Odroid H2+ SBCs are pretty awesome (TODO link to specs). They include two Realtek 2.5gbe ethernet ports, but one minor drawback is that you need to install the drivers before they work making the install a little trickier. Odroid has a good wiki page dedicated to this issue (https://wiki.odroid.com/odroid-h2/application_note/install_ethernet_driver_on_h2plus), but the absolute easiest thing to do is to share your phones internet connection via USB (it will be picked up automatically). Since I am going to install two nodes (at least), I figured I'd do something a little better. Enter CUBIC (Custom Ubuntu ISO Creator - https://launchpad.net/cubic), a GUI wizard to create a customized Ubuntu Live ISO image. There are lots of tutorials on the internet explaining how to use cubic, so I won't go into details, but basically we need to download the Ubuntu 20.04 ISO and open it up in cubic. One of the steps in the cubic wizard will drop you into a shell, at which point you'll want to follow the instructions on the Odroid wiki page above to add the hardkernel PPA and install the drivers (realtek-r8125-dkms) into the image. At this point, you might as well add CRI-O (following the instructions on their site):

export OS=xUbuntu_20.04
export VERSION=1.19
echo "deb https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/ /" > /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list
echo "deb http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$VERSION/$OS/ /" > /etc/apt/sources.list.d/devel:kubic:libcontainers:stable:cri-o:$VERSION.list

curl -L https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable:cri-o:$VERSION/$OS/Release.key | apt-key add -
curl -L https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/Release.key | apt-key add -

apt-get update
apt-get install cri-o cri-o-runc

Finish up the wizard, and create the new Ubuntu live image. Burn that image (balenaEtcher works well for this) onto a USB thumbdrive, and pop it into the first H2+ node. Install Ubuntu, picking whatever options you want... (I'm trying out ZFS, which is still experimental in 20.04). Since we already installed the drivers, you should have the internet available, as long as you have connected it to something with a DHCP server. Update to the latest software, etc. # Configuring and Running CRI-O First, kubernetes will require a few things for cri-o to work: (as root) run the following

modprobe overlay
modprobe br_netfilter

cat > /etc/sysctl.d/99-kubernetes-cri.conf <

Important! Also add overlay and br_netfilter to /etc/modules-load.d/modules.conf so that it's permanent.  I did not originally, rebooted, and then wondered why I was getting "/proc/sys/net/bridge/bridge-nf-call-iptables not found" errors when trying to run kubeadm!

Next up, install the CRI tool `crictl`

export VERSION="v1.19.0"
wget https://github.com/kubernetes-sigs/cri-tools/releases/download/$VERSION/crictl-$VERSION-linux-amd64.tar.gz
sudo tar zxvf crictl-$VERSION-linux-amd64.tar.gz -C /usr/local/bin


If you try it now, you'll see that CRI-O isn't running...

crictl info
FATA[0002] connect: connect endpoint 'unix:///var/run/crio/crio.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded 


So fix that:
sudo systemctl start crio
sudo systemctl enable crio


CRI-O will be running at this point, but it needs a CNI (container networking interface)
sudo crictl info
{
  "status": {
    "conditions": [
      {
        "type": "RuntimeReady",
        "status": true,
        "reason": "",
        "message": ""
      },
      {
        "type": "NetworkReady",
        "status": false,
        "reason": "NetworkPluginNotReady",
        "message": "Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?"
      }
    ]
  }
}


We're going to get that when we install kubernetes, so let's skip ahead...

But first, let's install OpenSSH server so that we can just complete the rest of these tasks

sudo apt install openssh-server


# Installing kubernetes

Note: at the time of writing this post, kubernetes 1.20 had just been released.  I will upgrade to that at some point in the future, but for now, I want 1.19 so that everything matches up (e.g. cri-o).  

You can run `apt list -a kubeadm` to see what's available, 1.19.4-00 was the latest in the 1.19 branch right now.

sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

cat <


Make sure swap is turned off

 sudo swapoff -a 



Here is where we diverge based on whether we're creating the master node or a worker...

IF MASTER...

Take a look at the kubadm defaults `kubeadm config print init-defaults`

W1209 08:04:47.908176   66064 kubelet.go:200] cannot automatically set CgroupDriver when starting the Kubelet: cannot execute 'docker info -f {{.CgroupDriver}}': executable file not found in $PATH
W1209 08:04:47.914979   66064 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 1.2.3.4
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: kmaster
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}c
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.19.0
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
scheduler: {}


We need to do some configuration before running `kubeadm`, see: https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/#config-file

Create a file: /etc/default/kubelet

Paste in: KUBELET_EXTRA_ARGS=--feature-gates="AllAlpha=false,RunAsGroup=true" --container-runtime=remote --cgroup-driver=systemd --container-runtime-endpoint='unix:///var/run/crio/crio.sock' --runtime-request-timeout=5m

Figure out what CIDR your CNI is using... cat /etc/cni/net.d/100-crio-bridge.conf  (it's probably 10.85.0.0/16).  Note, you cannot pass both --config and --pod-network-cidr as suggested by some other tutorials.  You must use a ClusterConfiguration in the --config file (as below)


Pass in the cgroup driver through an init file you will use with kubeadm (I used kubeadm-init.yaml)

apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
networking:
  podSubnet: "10.85.0.0/16"



sudo kubeadmn --config=kubeadm-init.yaml

IF WORKER...

Don't forget to restart the kubelet.

systemctl daemon-reload && systemctl restart kubelet


We need to join to the existing cluster.  Go to the master node and run:
kubeadm token create --print-join-command

Then take the output of that command and run it on the worker node.


DNS!!

Another thing that got me... I couldn't get external hostnames to resolve on the ubuntu hosts

Turns out there were no dns servers listed.

Edit /etc/systemd/resolved.conf on all nodes

Uncomment the FallbackDNS line and set it to your favorite DNS resolver (e.g. 1.1.1.1 or 8.8.8.8)

Restart the service:

service systemd-resolved restart


Great, now the external names resolve on the nodes.  But the pods themselves cannot resolve external names!  You may need to restart coreDNS to pick up the changes also.

kubectl rollout restart -n kube-system deployment coredns

dequeue