Friday, March 24, 2023

A start job is running for Wait for Network to be Configured

Recently installed Ubuntu Server 22.04 on an ODroid H2 (which has 2 2.5gbe ports), and every time the server reboots there's a 2 min pause with the message "A start job is running for Wait for Network to be Configured" while it counts up.

Luckily, it's an easy fix

Apparently the installer adds both interfaces to the netplan, so it happily waits for DHCP to assign a network address for the full two minutes (even if there's no ethernet cable attached)

To fix, edit /etc/netplan/00-installer-config.yaml and remove the interface you're not using.

For example, the file contents will look like the following:

# This is the network config written by 'subiquity'
network:
  ethernets:
    enp2s0:
      dhcp4: true
    enp3s0:
      dhcp4: true
  version: 2
Since there is only a network cable connected to 'enp2s0', we can simply remove the 'enp3s0' lines. (note: only changing the dhcp4 value to 'false' does NOT resolve the issue, removing the lines does)

Saturday, March 11, 2023

Broken signatures

What happens if you leave an Ubuntu machine running for a couple years and then try to update? Sometimes signatures expire...
Err:6 http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/1.19/xUbuntu_20.04  InRelease
  The following signatures were invalid: EXPKEYSIG 4D64390375060AA4 devel:kubic OBS Project <devel:kubic@build.opensuse.org>
To resolve: (from https://github.com/containers/podman.io/issues/296#issuecomment-1455207534)
wget -qO - https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_22.04/Release.key | sudo apt-key add -
Err:3 https://packages.cloud.google.com/apt kubernetes-xenial InRelease
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY B53DC80D13EDEF05
To resolve: (from https://github.com/kubernetes/release/issues/1982#issuecomment-1415573798)
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | gpg --dearmor | sudo dd status=none of=/usr/share/keyrings/kubernetes-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt update

Monday, July 11, 2022

Java's Project Panama and Rust - Simple example

There are a couple good guides to getting started with Foreign Function Interfaces (FFI) in Java 19, but since it's a feature preview, everything is subject to change and the ones I found were already out of date. 

 Here are my notes for getting a trivial Java program to call a Rust library:
  •   Install Java 19 Early Access (recommended: sdk install java 19.ea.29-open)
  •  Install jextract (it's not part of openJDK, instead look here: https://github.com/openjdk/jextract)
  •  Create the Rust program (TODO details, cargo init --lib) 
Cargo.toml
[package]
name ="myrustlibrary"
version = "0.1.0"
edition = "2021"

[dependencies]

[lib]
crate_type = ["cdylib"]

[build-dependencies]
cbindgen = "0.20.0"

Add a build.rs file, we want to use cbindgen to create the lib.h file we'll use with jextract.
extern crate cbindgen;

use std::env;

fn main() {
    let crate_dir = env::var("CARGO_MANIFEST_DIR").unwrap();

    cbindgen::Builder::new()
        .with_crate(crate_dir)
        .with_language(cbindgen::Language::C)
        .generate()
        .expect("Unable to generate bindings")
        .write_to_file("lib.h");
}

Add the src/lib.rs contents, for simplicity we'll just echo the PID

use std::process;

#[no_mangle]
pub extern "C" fn rust_get_pid() -> u32 {
    return process::id();
}

Now build it: cargo build

Important, keep track of where Cargo built your lib. 'lib.h' will be in the base folder, and the lib itself will be in the 'target/debug' folder (libmyrustlibrary.d libmyrustlibrary.dylib if you're on a Mac)

Run jextract on the lib.h file

  ./jextract  -t org.rust -l myrustlibrary --output classes ./lib.h
  

Now there will be a bunch of class files in the classes/org/rust dir

Write a Java program to make use of the header file we created from rust (lib.h)

import static org.rust.lib_h.*;	// notice this is the target package we specified when running jextract

public class Main {
  public static void main(String[] args){
    System.out.println("🦀 process id = " + rust_get_pid());
  }
}

And finally, tie it all together

  ./java --enable-preview --source 19 --enable-native-access=ALL-UNNAMED  -Djava.library.path=./target/debug -cp classes Main.java

🦀 process id = 5526
  

Saturday, April 24, 2021

Rough guide to upgrading k8s cluster w/ kubeadm

This is not the best way, just a way that works for me given the cluster topography I have (which was installed using kubeadm on ubuntu, and includes a non-HA etcd running in-cluster). On the control plane / master node: 1) Backup etcd (manually) You might need the info from the etcd pod (`kubectl -n kube-system describe po etcd-master`) to find the various certs/keys/etc, but really they're probably just at /etc/kubernetes/pki/etcd/
kubectl exec -n kube-system etcd-kmaster -- etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --key=/etc/kubernetes/pki/etcd/server.key --cert=/etc/kubernetes/pki/etcd/server.crt snapshot save /var/lib/etcd/snapshot.db
--ignore-daemonsets Backup important files locally (but really, these should also be backed-up on a different server)
mkdir $HOME/backup
sudo cp -r /etc/kubernetes/pki/etcd $HOME/backup/
sudo cp /var/lib/etcd/snapshot.db $HOME/backup/$(date +%Y-%m-%d--%H-%M)-snapshot.db
sudo cp /$HOME/kubeadm-init.yaml $HOME/backup
Figure out what we're going to upgrade to. Do NOT attempt to skip minor versions (i.e. go from 1.19 -> 1.20 -> 1.21, not 1.19 - 1.21)
sudo apt update
sudo apt-cache madison kubeadm
sudo kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.6", GitCommit:"8a62859e515889f07e3e3be6a1080413f17cf2c3", GitTreeState:"clean", BuildDate:"2021-04-15T03:26:21Z", GoVersion:"go1.15.10", Compiler:"gc", Platform:"linux/amd64"}

I'm going to go from 1.19.6-00 to 1.20.6-00 because that's what's currently available (and then from 1.20.6-00 to 1.21.0-00)

Remove the hold on kubeadm, update it, then freeze it again.

sudo apt-mark unhold kubeadm
sudo apt-get install -y kubeadm=1.20.6-00
sudo apt-mark hold kubeadm
Make sure it worked
sudo kubeadm version

kubeadm version: &version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.6", GitCommit:"8a62859e515889f07e3e3be6a1080413f17cf2c3", GitTreeState:"clean", BuildDate:"2021-04-15T03:26:21Z", GoVersion:"go1.15.10", Compiler:"gc", Platform:"linux/amd64"}
Cordon and drain the master node (I've got a pod using local storage, so that extra flag is necessary)
kubectl cordon kmaster
kubectl drain kmaster --ignore-daemonsets --delete-local-data
Check out the upgrade plan. I get two options, upgrade to latest in the v1.19 series (1.19.10) or upgrade to latest stable version (1.20.6)
sudo kubeadm upgrade plan
sudo kubeadm upgrade apply v1.20.6
Nothing else needed to be upgraded, so I saw
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.20.6". Enjoy!
[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.
It's still going to show 1.19.6, which is expected
kubectl get no
NAME        STATUS                     ROLES                  AGE    VERSION
kmaster     Ready,SchedulingDisabled   control-plane,master   128d   v1.19.6
kworker01   Ready                      none                   125d   v1.19.6
Now to upgrade kubelet and kubectl to the SAME version as kubeadm
sudo apt-mark unhold kubelet kubectl
sudo apt-get install -y  kubelet=1.20.6-00 kubectl=1.20.6-00
sudo apt-mark hold kubelet kubectl
sudo systemctl daemon-reload
sudo systemctl restart kubelet.service
Now we should see the master node running the updated version
kubectl get no
NAME        STATUS                     ROLES                  AGE    VERSION
kmaster     Ready,SchedulingDisabled   control-plane,master   128d   v1.20.6
kworker01   Ready                      none                   125d   v1.19.6
Uncordon it, and make sure it shows 'Ready' Now drain the worker(s) and then repeat roughly the same process on the worker nodes (and yes, the --force is necessary because I'm running something that isn't set up correctly or playing nicely - I'm looking at you operatorhub)
kubectl drain kworker01 --ignore-daemonsets --delete-local-data --force
On the worker node(s)
sudo apt-mark unhold kubeadm
sudo apt-get install -y kubeadm=1.20.6-00
sudo apt-mark hold kubeadm

sudo kubeadm upgrade node

sudo apt-mark unhold kubelet kubectl
sudo apt-get install -y  kubelet=1.20.6-00 kubectl=1.20.6-00
sudo apt-mark hold kubelet kubectl

sudo systemctl daemon-reload
sudo systemctl restart kubelet.service
Back on the master node, we should be able to get the nodes and see that the worker is upgraded. Since it is, we can uncordon it, and it should switch to 'Ready'
kubectl get no
NAME        STATUS                     ROLES                  AGE    VERSION
kmaster     Ready                      control-plane,master   128d   v1.20.6
kworker01   Ready                      none                   125d   v1.20.6
That's it! Rinse and repeat for 1.21 once the entire cluster is on 1.20

Thursday, April 1, 2021

Mysql connection error

This was a mildly interesting one. I run some applications on my laptop that talk to a k8s cluster in my office, including a mysql instance. The main application started failing with the common "The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server" error. The app had been running earlier today. Debugging it, the first step is always the logs.
  kubectl logs mysql-57f577f4b9-gvtlz
Lo and behold, a bunch of suspicious errors:
2021-03-10T02:49:19.349769Z 149591 [ERROR] Disk is full writing './mysql-bin.000015' (Errcode: 15781392 - No space left on device). Waiting for someone to free space...
2021-03-10T02:49:19.349823Z 149591 [ERROR] Retry in 60 secs. Message reprinted in 600 secs
2021-03-10T02:58:46.658696Z 151120 [ERROR] Disk is full writing './mysql-bin.~rec~' (Errcode: 15781392 - No space left on device). Waiting for someone to free space...
2021-03-10T02:58:46.658728Z 151120 [ERROR] Retry in 60 secs. Message reprinted in 600 secs
2021-03-10T02:59:19.352777Z 149591 [ERROR] Disk is full writing './mysql-bin.000015' (Errcode: 15781392 - No space left on device). Waiting for someone to free space...
2021-03-10T02:59:19.354093Z 149591 [ERROR] Retry in 60 secs. Message reprinted in 600 secs
2021-03-10T03:04:46.886946Z 151120 [ERROR] Error in Log_event::read_log_event(): 'read error', data_len: 61, event_type: 34
Looks like the bin logs have finally filled up the volume. Unfortunately, I created that pod with a rather small PVC, and since I'm using OpenEBS, it won't easily resize. What to do? Log into the instance and clean out the logs...
  kubectl exec -it mysql-57f577f4b9-gvtlz -- /bin/sh
  rm /var/lib/mysql/mysql-bin*
Problem solved! (well, temporarily, until they fill up again)

Sunday, March 28, 2021

Rust app using build container and distroless

Turns out that, just like with Golang, it's really quite simple to craft a small container image for a Rust app. Taking a trivial "hello world" app using Actix, we can use a multi-stage build, and then one of the Google distroless container images as a base, to build a tiny final image. Dockerfile:
FROM rust:1.51 as builder
LABEL maintainer="yourname@whatever.com"

WORKDIR /app
COPY . /app
RUN cargo build --release

FROM gcr.io/distroless/cc-debian10
COPY --from=builder /app/target/release/hello-world /
EXPOSE 1111
CMD ["./hello-world"]
Don't forget to include a .dockerignore file at the same level as your Dockerfile (even if you're using podman/buildah - they will respect the .dockerignore). At a minimum, there's no need to include the git directories in the build context: .dockerignore
.git
target/
Finally, build your image:
docker build -t hello-world .
Although the build container (rust:1.51) is rather large, 1.27GB, and the intermediate images somehow balloon to 2.5GB, the final image is only ~30MB

Wednesday, December 9, 2020

Hybrid Kubernetes cluster (arm/x86)

This will be a long post (or maybe multiple posts). The end result will be a 7 node Kubernetes cluster capable of running both x86 and arm64 workloads. Hardware * 2 Odroid H2+ nodes (one master and one worker) * 5 Raspberry Pi 4 nodes (all workers) * network switch * misc (poweradapters / cat6e cables / etc) Software Stack Ubuntu 20.04 Kubernetes 1.19 CRI-O First up, the master node. These Odroid H2+ SBCs are pretty awesome (TODO link to specs). They include two Realtek 2.5gbe ethernet ports, but one minor drawback is that you need to install the drivers before they work making the install a little trickier. Odroid has a good wiki page dedicated to this issue (https://wiki.odroid.com/odroid-h2/application_note/install_ethernet_driver_on_h2plus), but the absolute easiest thing to do is to share your phones internet connection via USB (it will be picked up automatically). Since I am going to install two nodes (at least), I figured I'd do something a little better. Enter CUBIC (Custom Ubuntu ISO Creator - https://launchpad.net/cubic), a GUI wizard to create a customized Ubuntu Live ISO image. There are lots of tutorials on the internet explaining how to use cubic, so I won't go into details, but basically we need to download the Ubuntu 20.04 ISO and open it up in cubic. One of the steps in the cubic wizard will drop you into a shell, at which point you'll want to follow the instructions on the Odroid wiki page above to add the hardkernel PPA and install the drivers (realtek-r8125-dkms) into the image. At this point, you might as well add CRI-O (following the instructions on their site):
export OS=xUbuntu_20.04
export VERSION=1.19
echo "deb https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/ /" > /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list
echo "deb http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$VERSION/$OS/ /" > /etc/apt/sources.list.d/devel:kubic:libcontainers:stable:cri-o:$VERSION.list

curl -L https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable:cri-o:$VERSION/$OS/Release.key | apt-key add -
curl -L https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/Release.key | apt-key add -

apt-get update
apt-get install cri-o cri-o-runc
Finish up the wizard, and create the new Ubuntu live image. Burn that image (balenaEtcher works well for this) onto a USB thumbdrive, and pop it into the first H2+ node. Install Ubuntu, picking whatever options you want... (I'm trying out ZFS, which is still experimental in 20.04). Since we already installed the drivers, you should have the internet available, as long as you have connected it to something with a DHCP server. Update to the latest software, etc. # Configuring and Running CRI-O First, kubernetes will require a few things for cri-o to work: (as root) run the following
modprobe overlay
modprobe br_netfilter

cat > /etc/sysctl.d/99-kubernetes-cri.conf <

Important! Also add overlay and br_netfilter to /etc/modules-load.d/modules.conf so that it's permanent.  I did not originally, rebooted, and then wondered why I was getting "/proc/sys/net/bridge/bridge-nf-call-iptables not found" errors when trying to run kubeadm!

Next up, install the CRI tool `crictl`

export VERSION="v1.19.0"
wget https://github.com/kubernetes-sigs/cri-tools/releases/download/$VERSION/crictl-$VERSION-linux-amd64.tar.gz
sudo tar zxvf crictl-$VERSION-linux-amd64.tar.gz -C /usr/local/bin
If you try it now, you'll see that CRI-O isn't running...
crictl info
FATA[0002] connect: connect endpoint 'unix:///var/run/crio/crio.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded 
So fix that:
sudo systemctl start crio
sudo systemctl enable crio
CRI-O will be running at this point, but it needs a CNI (container networking interface)
sudo crictl info
{
  "status": {
    "conditions": [
      {
        "type": "RuntimeReady",
        "status": true,
        "reason": "",
        "message": ""
      },
      {
        "type": "NetworkReady",
        "status": false,
        "reason": "NetworkPluginNotReady",
        "message": "Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?"
      }
    ]
  }
}
We're going to get that when we install kubernetes, so let's skip ahead... But first, let's install OpenSSH server so that we can just complete the rest of these tasks
sudo apt install openssh-server
# Installing kubernetes Note: at the time of writing this post, kubernetes 1.20 had just been released. I will upgrade to that at some point in the future, but for now, I want 1.19 so that everything matches up (e.g. cri-o). You can run `apt list -a kubeadm` to see what's available, 1.19.4-00 was the latest in the 1.19 branch right now.
sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

cat <


Make sure swap is turned off

 sudo swapoff -a 
Here is where we diverge based on whether we're creating the master node or a worker... IF MASTER... Take a look at the kubadm defaults `kubeadm config print init-defaults`
W1209 08:04:47.908176   66064 kubelet.go:200] cannot automatically set CgroupDriver when starting the Kubelet: cannot execute 'docker info -f {{.CgroupDriver}}': executable file not found in $PATH
W1209 08:04:47.914979   66064 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 1.2.3.4
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: kmaster
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}c
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.19.0
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
scheduler: {}
We need to do some configuration before running `kubeadm`, see: https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/#config-file Create a file: /etc/default/kubelet Paste in: KUBELET_EXTRA_ARGS=--feature-gates="AllAlpha=false,RunAsGroup=true" --container-runtime=remote --cgroup-driver=systemd --container-runtime-endpoint='unix:///var/run/crio/crio.sock' --runtime-request-timeout=5m Figure out what CIDR your CNI is using... cat /etc/cni/net.d/100-crio-bridge.conf (it's probably 10.85.0.0/16). Note, you cannot pass both --config and --pod-network-cidr as suggested by some other tutorials. You must use a ClusterConfiguration in the --config file (as below) Pass in the cgroup driver through an init file you will use with kubeadm (I used kubeadm-init.yaml)
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
networking:
  podSubnet: "10.85.0.0/16"

sudo kubeadmn --config=kubeadm-init.yaml IF WORKER... Don't forget to restart the kubelet.
systemctl daemon-reload && systemctl restart kubelet
We need to join to the existing cluster. Go to the master node and run:
kubeadm token create --print-join-command
Then take the output of that command and run it on the worker node. DNS!! Another thing that got me... I couldn't get external hostnames to resolve on the ubuntu hosts Turns out there were no dns servers listed. Edit /etc/systemd/resolved.conf on all nodes Uncomment the FallbackDNS line and set it to your favorite DNS resolver (e.g. 1.1.1.1 or 8.8.8.8) Restart the service:
service systemd-resolved restart
Great, now the external names resolve on the nodes. But the pods themselves cannot resolve external names! You may need to restart coreDNS to pick up the changes also.
kubectl rollout restart -n kube-system deployment coredns