search by tags

for the user

adventures into the land of the command line

how to fix 'failed create pod sandbox' issue in k8s

You see a whole bunch of pods stuck in a creating loop

★ kubectl get pods -n my-groovy-namespace
NAME                             READY     STATUS              RESTARTS   AGE
my-groovy-pod-tp2h9              0/1       ContainerCreating   0          9m
my-groovy-pod-xjhcc              1/1       Running             0          9m
my-groovy-pod-8d5b9              0/1       ContainerCreating   0          32s
my-groovy-pod-vfksp              1/1       Running             0          9m
my-groovy-pod-55szr              0/1       ContainerCreating   0          9m
my-groovy-pod-fmjnx              1/1       Running             0          9m
my-groovy-pod-5wkrq              0/1       ContainerCreating   0          52s
my-groovy-pod-rsghq              1/1       Running             0          9m

You inspect one of them and you see a bunch of messages saying stuff like below

★ kubectl describe pod -n my-groovy-namespace my-groovy-pod-tp2h9
.
.
.
Events:
  Type     Reason                  Age                From                            Message
  ----     ------                  ----               ----                            -------
  Normal   Scheduled               38s                default-scheduler               Successfully assigned my-groovy-pod-tp2h9 to k8s-qapool-27652675-4
  Normal   SuccessfulMountVolume   37s                kubelet, k8s-qapool-27652675-4  MountVolume.SetUp succeeded for volume "default-token-zbpr5"
  Warning  FailedCreatePodSandBox  12s (x8 over 36s)  kubelet, k8s-qapool-27652675-4  Failed create pod sandbox.
  Warning  FailedSync              12s (x8 over 36s)  kubelet, k8s-qapool-27652675-4  Error syncing pod
  Normal   SandboxChanged          10s (x8 over 36s)  kubelet, k8s-qapool-27652675-4  Pod sandbox changed, it will be killed and re-created.

In the dashboard you see lots of red and messages saying

Failed create pod sandbox.
Error syncing pod

You notice that it is all related to the same node

★ ssh k8s-qapool-27652675-4
★ sudo su -
★ ps -ef | grep [d]ocker

root       7042      1  2 Mar19 ?        2-04:02:06 dockerd -H fd:// --storage-driver=overlay2 --bip=172.17.0.1/16
root       7746      1  0 Mar19 ?        00:28:17 docker-containerd-shim 9f07a121fb66d61285fb246b5157cb640f7df902965421603c0ad66800a00637 /var/run/docker/libcontainerd/9f07a121fb66d61285fb246b5157cb640f7df902965421603c0ad66800a00637 docker-runc
root       8157      1  0 Mar19 ?        00:00:05 docker-containerd-shim 9258d496ab97ce1e0736bc0521cf4146045fdfb30b835675433622e5c7ed7c67 /var/run/docker/libcontainerd/9258d496ab97ce1e0736bc0521cf4146045fdfb30b835675433622e5c7ed7c67 docker-runc
root       8208      1  0 Mar19 ?        00:00:16 docker-containerd-shim 0ae4b84fdf301965f047a9aa6275be1dfa1daa03f492bf5bd29c4020d9ce165a /var/run/docker/libcontainerd/0ae4b84fdf301965f047a9aa6275be1dfa1daa03f492bf5bd29c4020d9ce165a docker-runc
root      12741      1  0 Jun28 ?        00:00:00 docker-containerd-shim 858aeacbe1ee80724423aa24e14bbd0f984e3f98e6c1ba13958a48bd72f73628 /var/run/docker/libcontainerd/858aeacbe1ee80724423aa24e14bbd0f984e3f98e6c1ba13958a48bd72f73628 docker-runc
root      34870      1  0 Jun28 ?        00:00:00 docker-containerd-shim b005c6f1d189e77c48ac9573631689515389eb634666f63ca18835e0cda9be09 /var/run/docker/libcontainerd/b005c6f1d189e77c48ac9573631689515389eb634666f63ca18835e0cda9be09 docker-runc
root      41490      1  0 Jun30 ?        00:00:00 docker-containerd-shim bff891b51748ab7c963b5585669e5d79943e731960f13e4943f8fb1f7dc04822 /var/run/docker/libcontainerd/bff891b51748ab7c963b5585669e5d79943e731960f13e4943f8fb1f7dc04822 docker-runc
root      41562      1  0 May09 ?        00:00:02 docker-containerd-shim 13f6fad9ed621b5de2675db686be4a7b693c2cfe2446620860e601b94d131465 /var/run/docker/libcontainerd/13f6fad9ed621b5de2675db686be4a7b693c2cfe2446620860e601b94d131465 docker-runc
root      41613      1  0 May09 ?        00:00:02 docker-containerd-shim ae0f4e408bfee941c155832ed2cba934884dd770fac1cd0f0791eaca75416a1a /var/run/docker/libcontainerd/ae0f4e408bfee941c155832ed2cba934884dd770fac1cd0f0791eaca75416a1a docker-runc

You restart the docker daemon running on that node

★ service docker restart

You then delete all the faulty pods and notice them being reprovisioned succesfully again with the working docker daemon

★ kubectl get pods -n my-groovy-namespace
NAME                             READY     STATUS              RESTARTS   AGE
my-groovy-pod-75lh7              1/1       Running             0          1m
my-groovy-pod-xjhcc              1/1       Running             0          30m
my-groovy-pod-xxxp5              1/1       Running             0          1m
my-groovy-pod-vfksp              1/1       Running             0          30m
my-groovy-pod-pzjdh              1/1       Running             0          1m
my-groovy-pod-fmjnx              1/1       Running             0          30m
my-groovy-pod-sp6sg              1/1       Running             0          1m
my-groovy-pod-rsghq              1/1       Running             0          30m

Et voila