⟵ Home

K8s Bare Metal

January 21, 2025 ∙ 11 minute read

I know there are so many posts on how to configure a bare-metal K8s cluster, but after studying a lot, here’s my take on it.

Here we will be configuring a cluster composed by:

I’m provisioning my cluster on ESXi that’s running behind a PFSense, so traffic can be controlled, and DHCP does not touch my main home network. I’ll also use an internal DNS to route everything, and will use domains *.k.vito.sh for exposing services hosted in the cluster, and *.kube.vito.sh to access VMS.

All machines will be provisioned with 8GB RAM and 4 cores. The network will look like this:

Provisioning Nodes

Warning: You will notice I’m running several commands without sudo. I usually just get a root shell using sudo su in order to execute everything without hassle.

All nodes except k8s-nfs follow the same steps:

  1. Install Ubuntu 22.04 LTS. I use a minified server installation.
  2. Make sure everything is up-to-date:
    1. apt update && apt upgrade -y && reboot
  3. Drop snap
    1. apt purge snapd
  4. Install utilities:
    1. apt install iputils-ping vim apt-transport-https ca-certificates curl gnupg2 gpg software-properties-common -y

Install Cri-O

  1. export OS=xUbuntu_22.04
  2. export CRIO_VERSION=1.26
  3. echo "deb https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/ /"| sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list
  4. echo "deb http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$CRIO_VERSION/$OS/ /"|sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable:cri-o:$CRIO_VERSION.list
  5. curl -L https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable:cri-o:$CRIO_VERSION/$OS/Release.key | sudo apt-key add -
  6. curl -L https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/Release.key | sudo apt-key add -
  7. apt update && apt install cri-o cri-o-runc -y
  8. systemctl enable --now crio
  9. Check Cri-O status: systemctl status crio

Enable conmon

  1. Edit /etc/crio/crio.conf, find and uncomment the line conmon = ""
  2. Insert /usr/bin/conmon between the quotes
  3. Restart Cri-O: systemctl restart crio
  4. Check its status: systemctl status crio

Install Cri-O tools

  1. apt install -y cri-tools
  2. Check everything is working: crictl --runtime-endpoint unix:///var/run/crio/crio.sock version
  3. Check if it is ready: crictl info. It is expected RuntimeReady to be true

Configure Networking

This is optional, but since I use an internal DNS, I need to set nameservers through netplan:

  1. Edit /etc/netplan/00-installer-config.yaml and add the following line: nameservers.addresses: [10.0.1.3]
  2. Run netplan apply

Prepare APT for K8s installation

  1. Make sure /etc/apt/keyrings exist. Create it if it does not:
    1. [ -d /etc/apt/keyrings ] || mkdir -p -m 755 /etc/apt/keyrings
  2. Download public signing key for the K8s packages: curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.32/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
  3. Add the appropriate apt repository: echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
  4. Install Kubernetes components:
    1. apt update
    2. apt install -y kubelet kubeadm kubectl
    3. apt-mark hold kubelet kubeadm kubectl
  5. Enable and start kubelet: sudo systemctl enable --now kubelet

Update System Configuration for Kubernetes

  1. Enable required kernel modules:
    1. modprobe overlay
    2. modprobe br_netfilter
  2. Ensure modules are loaded on restart:
echo overlay | tee -a /etc/modules
echo br_netfilter | tee -a /etc/modules
  1. Update network configuration:
cat > /etc/sysctl.d/99-kubernetes-cri.conf <<EOF
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
  1. Apply system configuration: sysctl --system
  2. Disable swap
    1. systemctl mask swap.img.swap
    2. systemctl stop swap.img.swap
    3. swapoff -a
  3. Update kubeadm configuration to use systemd as its cgroup driver:
    1. Create /etc/default/kubelet with the following contents:
KUBELET_EXTRA_ARGS=--cgroup-driver=systemd --runtime-request-timeout=5m
  1. Apply changes once again
    1. systemctl daemon-reload
    2. systemctl restart kubelet

At this point, kubelet will be crashing, waiting for kubeadm instructions.

Bootstrapping the Cluster

The next step must be performed only on the master node. This will bootstrap the cluster. Other nodes will join in another step.

To bootstrap the cluster, use kubeadm:

kubeadm init --pod-network-cidr=10.244.0.0/16 --control-plane-endpoint=master.kube.vito.sh

The command may take a while and will output instructions on how to allow non-root access to kubectl and how to join nodes to the cluster. Take note of those instructions!

Exit the root shell, and go back to your non-root user. Then:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Finally, run kubectl get ns. If you get a response, congratulations, you have a working master node!

Setup Flannel

Flannel is a CNI plugin that allows networking to work on your cluster. More information can be found here.

Installation is straightforward, requiring a single command:

kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

Then you can use watch kubectl get pods --all-namespaces and wait for Flannel to bootup.

Setup Workers

Now, SSH into each of the workers and using information emitted by kubeadm, join them to the cluster. The command will look like the following, and must be executed with root privileges:

kubeadm join master.kube.vito.sh:6443 --token qn2gjj.e6y272oiqhqa2emu \
    --discovery-token-ca-cert-hash sha256:401e68dbd991b30bcad71ed2cc67c03c5e970e60ac6eb21db43679104431e678

Label Workers

Login to the master node again, and check all nodes are visible using kubectl get nodes. Notice that wokers have no role, meaning they won’t pick containers to run. Label them as workers:

kubectl label node k8s-worker1 node-role.kubernetes.io/worker=worker
kubectl label node k8s-worker2 node-role.kubernetes.io/worker=worker
kubectl label node k8s-worker3 node-role.kubernetes.io/worker=worker
kubectl label node k8s-worker4 node-role.kubernetes.io/worker=worker

Bootstrapping the Cluster: Part 2

Install MetalLB

MetalLB allows bare-metal clusters to have ingresses and load balancers without requiring a cloud-provider.

The first step to install it is to enable strictARP in kube-proxy. This can be done by running the command below:

kubectl edit configmap kube-proxy -n kube-system

Locate the line strictARP: and change its value from false to true.

Installing it is also simple:

kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.9/config/manifests/metallb-native.yaml

Finally, it needs to be configured in order to know which IPs it may use. In my case, the following is enough:

# metallb-ippool.yaml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: default
  namespace: metallb-system
spec:
  addresses:
    - 10.0.11.50-10.0.11.254
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: default
  namespace: metallb-system

Apply the manifest using kubectl apply:

kubectl apply -f metallb-ippool.yaml

Install k8s_gateway

k8s_gateway exposes a DNS server that is able to resolve ingresses to nodes IPs. Installation is made through helm:

helm repo add k8s_gateway https://ori-edge.github.io/k8s_gateway/
helm install exdns --set domain=k.vito.sh --set service.loadBalancerIP=10.0.11.53 k8s_gateway/k8s-gateway

I want to expose 10.0.11.53 as the DNS service. My upstream DNS server forwards requests to k.vito.sh to this IP.

Install nginx-ingress

nginx-ingress is also installed through helm:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx

Apply the chart using the following command:

helm install ingress-nginx ingress-nginx/ingress-nginx -n ingress-nginx --create-namespace

Test the ingress and networking

The following manifest contains a deployment, service, and ingress that shows NGiNX’s default welcome message:

# test.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: test-ingress
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  namespace: test-ingress
  labels:
    app: nginx
spec:
  containers:
    - name: nginx
      image: nginx:latest
      ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  namespace: test-ingress
spec:
  selector:
    app: nginx
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nginx-ingress
  namespace: test-ingress
spec:
  ingressClassName: "nginx"
  rules:
    - host: test.k.vito.sh
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: nginx-service
                port:
                  number: 80

Apply it using kubectl apply -f test.yaml. Eventually, the page will be available over test.k.vito.sh.

Enabling support for PersistentVolumeClaims

For this part, a new machine will be provisioned. I’ll use alpine as it is ultra lightweight, performs well, and installs even faster. I’ll configure it with 4GB RAM, 4 CPUs, and 100GB of disk.

Once the VM is provisioned, let’s configure NFS on it:

First, install required packages:

apk update && apk add nfs-utils vim

Create the data directory:

mkdir /data

Update its owner to nobody:nogroup:

chown nobody:nogroup /data

And configure /etc/exports just like in any other Linux server. It is important to notice that each node that will be given access to the NFS server must be included in the /etc/exports:

/data 10.0.11.1(rw,sync,no_subtree_check) 10.0.11.2(rw,sync,no_subtree_check) 10.0.11.3(rw,sync,no_subtree_check) 10.0.11.4(rw,sync,no_subtree_check)

After saving the file, reload settings:

exportfs -afv

And ensure to start the NFS service on boot:

rc-update add nfs
rc-service nfs start

Preparing Workers

Now, on each worker, install nfs-common through APT:

apt install -y nfs-common

And finally, install the provisioner through Helm:

helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner
helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
  --create-namespace \
  --namespace nfs-system \
  --set nfs.server=nfs.kube.vito.sh \
  --set nfs.path=/data

Then, try to create a PVC:

# pvc-test.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-test
  namespace: default
spec:
  accessModes:
  - ReadWriteMany
  storageClassName: nfs-client
  resources:
    requests:
      storage: 20Gi
kubectl apply -f pvc-test.yaml

Reading the PVC status, everything should be good:

$ kubectl get pvc
NAME       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
pvc-test   Bound    pvc-ff3b4825-3836-4661-9958-c2b8948135af   20Gi       RWX            nfs-client     <unset>                 7s

TLS & Other Trinkets

What’s missing now is an Argo CD instance, and have automatic TLS for our services. This will be covered in the future!

Happy K8sing!