Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"cgroupsd\" is different from docker cgroup driver

Table of contents

No heading

No headings in the article.

Recently I was using vagrant to bootstrap a Kubernetes environment (version 1.21.1 ), and noticed that when I modified the docker daemon to use the systemd instead of the cgroupsd, the request to the API server was rejected.

With checking journalctl I came across the error which I put in the title of this post.

journalctl | grep -i docker

error.png

But what is Cgroups?

On Linux, control groups are used to constrain resources that are allocated to processes. Both kubelet and the underlying container runtime need to interface with control groups to enforce resource management for pods and containers and set resources such as cpu/memory requests and limits. To interface with control groups, the kubelet and the container runtime need to use a cgroup driver. It's critical that the kubelet and the container runtime uses the same cgroup driver and are configured the same.

In our case, Docker container runtime was modified to use systemd but the kubelet configured to use cgroupsd which resulted in the error.

The Container runtimes page explains that the systemd driver is recommended for kubeadm based setups instead of the cgroupfs driver, because kubeadm manages the kubelet as a systemd service.

When systemd is chosen as the init system for a Linux distribution ( RedHat , CentOS and Fedora), the init process generates and consumes a root control group (cgroup) and acts as a cgroup manager.

systemd has a tight integration with cgroups and allocates a cgroup per systemd unit. As a result, if you use systemd as the init system with the cgroupfs driver, the system gets two different cgroup managers.

Two cgroup managers result in two views of the available and in-use resources in the system. In some cases, nodes that are configured to use cgroupfs for the kubelet and container runtime, but use systemd for the rest of the processes become unstable under resource pressure.

The approach to mitigate this instability is to use systemd as the cgroup driver for the kubelet and the container runtime when systemd is the selected init system.

In v1.22 and above, if the user is not setting the cgroupDriver field under KubeletConfiguration, kubeadm will default it to systemd.

So now that we know the error caused because of the mismatch between cgroup driver between kubelet and docker container runtime, how to fix it?

Migrating to the systemd driver

I. Modify the kubelet ConfigMap

kubectl edit cm kubelet-config -n kube-system

cgroupDriver: systems

If the above command resulted in the error, Error from server (NotFound): configmaps "kubelet-config" not found then you can query for the kubelet-config CM using this command

kubectl get cm -n kube-system
II. Update the cgroup driver on all nodes

For each node in the cluster:

  • Drain the node using
kubectl drain <node-name> --ignore-daemonsets
  • Stop the kubelet using
systemctl stop kubelet
  • Modify the container runtime cgroup driver to systems
cat /etc/docker/daemon.json 
{
"exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2",
  "storage-opts": [
    "overlay2.override_kernel_check=true"
  ]
}
  • Set cgroupDriver: systemd in /var/lib/kubelet/config.yaml

  • Start the container runtime

systemctl start docker
  • Start the kubelet using
systemctl start kubelet
  • Uncordon the node using
 kubectl uncordon <node-name>