Kubernetes - ошибка и перезагрузка proxy каждые 4 минуты

Установил Kubernetes + Docker по стандартной инструкции (Инструкция отсюда).

Ubuntu 24.04 Docker 24.0.7 Kuberct 1.30.2

Всё хорошо, все поды вроде бы запустились. Но спустя пару минут под kube-proxi уходит в перезагрузку.

kubectl get pods -n kube-system (кроме этого пода всё ок)

NAME                                  READY   STATUS             RESTARTS         AGE
coredns-7db6d8ff4d-6k7mj              1/1     Running            0                10h
coredns-7db6d8ff4d-xqcd4              1/1     Running            0                10h
etcd-master-node                      1/1     Running            1 (12h ago)      72m
kube-apiserver-master-node            1/1     Running            1 (12h ago)      10h
kube-controller-manager-master-node   1/1     Running            1 (12h ago)      10h
kube-proxy-ftns6                      0/1     CrashLoopBackOff   132 (4m21s ago)  10h
kube-scheduler-master-node            1/1     Running            1 (12h ago)      10h

kubectl describe pod kube-proxy-ftns6 -n kube-system когда под запущен

Name:                 kube-proxy-ftns6
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      kube-proxy
Node:                 master-node/192.168.0.104
Start Time:           Fri, 12 Jul 2024 18:53:30 +0000
Labels:               controller-revision-hash=669fc44fbc
                      k8s-app=kube-proxy
                      pod-template-generation=1
Annotations:          <none>
Status:               Running
IP:                   192.168.0.104
IPs:
  IP:           192.168.0.104
Controlled By:  DaemonSet/kube-proxy
Containers:
  kube-proxy:
    Container ID:  containerd://eefa3b9daa97e025be84e7af7a75ac6d801b1b0cfccc8dd9976477ab6ac41593
    Image:         registry.k8s.io/kube-proxy:v1.30.2
    Image ID:      registry.k8s.io/kube-proxy@sha256:8a44c6e094af3dea3de57fa967e201608a358a3bd8b4e3f31ab905bbe4108aec
    Port:          <none>
    Host Port:     <none>
    Command:
      /usr/local/bin/kube-proxy
      --config=/var/lib/kube-proxy/config.conf
      --hostname-override=$(NODE_NAME)
    State:          Running
      Started:      Fri, 12 Jul 2024 20:16:15 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Fri, 12 Jul 2024 20:10:01 +0000
      Finished:     Fri, 12 Jul 2024 20:11:07 +0000
    Ready:          True
    Restart Count:  17
    Environment:
      NODE_NAME:   (v1:spec.nodeName)
    Mounts:
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /var/lib/kube-proxy from kube-proxy (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xrbwk (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  kube-proxy:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-proxy
    Optional:  false
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:
  kube-api-access-xrbwk:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason   Age                    From     Message
  ----     ------   ----                   ----     -------
  Normal   Killing  17m (x15 over 82m)     kubelet  Stopping container kube-proxy
  Warning  BackOff  3m32s (x294 over 80m)  kubelet  Back-off restarting failed container kube-proxy in pod kube-proxy-ftns6_kube-system(60ac7788-9c3d-44f0-8cad-404e7097815d)

kubectl describe pod kube-proxy-ftns6 -n kube-system когда под упал

Name:                 kube-proxy-ftns6
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      kube-proxy
Node:                 master-node/192.168.0.104
Start Time:           Fri, 12 Jul 2024 18:53:30 +0000
Labels:               controller-revision-hash=669fc44fbc
                      k8s-app=kube-proxy
                      pod-template-generation=1
Annotations:          <none>
Status:               Running
IP:                   192.168.0.104
IPs:
  IP:           192.168.0.104
Controlled By:  DaemonSet/kube-proxy
Containers:
  kube-proxy:
    Container ID:  containerd://eefa3b9daa97e025be84e7af7a75ac6d801b1b0cfccc8dd9976477ab6ac41593
    Image:         registry.k8s.io/kube-proxy:v1.30.2
    Image ID:      registry.k8s.io/kube-proxy@sha256:8a44c6e094af3dea3de57fa967e201608a358a3bd8b4e3f31ab905bbe4108aec
    Port:          <none>
    Host Port:     <none>
    Command:
      /usr/local/bin/kube-proxy
      --config=/var/lib/kube-proxy/config.conf
      --hostname-override=$(NODE_NAME)
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Fri, 12 Jul 2024 20:16:15 +0000
      Finished:     Fri, 12 Jul 2024 20:17:35 +0000
    Ready:          False
    Restart Count:  17
    Environment:
      NODE_NAME:   (v1:spec.nodeName)
    Mounts:
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /var/lib/kube-proxy from kube-proxy (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xrbwk (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       False
  ContainersReady             False
  PodScheduled                True
Volumes:
  kube-proxy:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-proxy
    Optional:  false
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:
  kube-api-access-xrbwk:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason   Age                  From     Message
  ----     ------   ----                 ----     -------
  Normal   Killing  19m (x15 over 84m)   kubelet  Stopping container kube-proxy
  Warning  BackOff  38s (x311 over 82m)  kubelet  Back-off restarting failed container kube-proxy in pod kube-proxy-ftns6_kube-system(60ac7788-9c3d-44f0-8cad-404e7097815d)

kubectl logs kube-proxy-ftns6

I0712 20:40:36.633174       1 server_linux.go:69] "Using iptables proxy"
I0712 20:40:36.645874       1 server.go:1062] "Successfully retrieved node IP(s)" IPs=["192.168.0.104"]
I0712 20:40:36.658069       1 conntrack.go:59] "Setting nf_conntrack_max" nfConntrackMax=917504
I0712 20:40:36.685069       1 server.go:659] "kube-proxy running in dual-stack mode" primary ipFamily="IPv4"
I0712 20:40:36.685100       1 server_linux.go:165] "Using iptables Proxier"
I0712 20:40:36.688152       1 server_linux.go:511] "Detect-local-mode set to ClusterCIDR, but no cluster CIDR for family" ipFamily="IPv6"
I0712 20:40:36.688172       1 server_linux.go:528] "Defaulting to no-op detect-local"
I0712 20:40:36.688204       1 proxier.go:243] "Setting route_localnet=1 to allow node-ports on localhost; to change this either disable iptables.localhostNodePorts (--iptables-localhost-nodeports) or set nodePortAddresses (--nodeport-addresses) to filter loopback addresses"
I0712 20:40:36.688443       1 server.go:872] "Version info" version="v1.30.2"
I0712 20:40:36.688471       1 server.go:874] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0712 20:40:36.689749       1 config.go:101] "Starting endpoint slice config controller"
I0712 20:40:36.689799       1 shared_informer.go:313] Waiting for caches to sync for endpoint slice config
I0712 20:40:36.689870       1 config.go:192] "Starting service config controller"
I0712 20:40:36.689918       1 shared_informer.go:313] Waiting for caches to sync for service config
I0712 20:40:36.689966       1 config.go:319] "Starting node config controller"
I0712 20:40:36.690032       1 shared_informer.go:313] Waiting for caches to sync for node config
I0712 20:40:36.789984       1 shared_informer.go:320] Caches are synced for endpoint slice config
I0712 20:40:36.790056       1 shared_informer.go:320] Caches are synced for service config
I0712 20:40:36.790149       1 shared_informer.go:320] Caches are synced for node config

kubectl get events

LAST SEEN   TYPE     REASON     OBJECT             MESSAGE
57m         Normal   Starting   node/master-node
50m         Normal   Starting   node/master-node
44m         Normal   Starting   node/master-node
38m         Normal   Starting   node/master-node
31m         Normal   Starting   node/master-node
25m         Normal   Starting   node/master-node
19m         Normal   Starting   node/master-node
12m         Normal   Starting   node/master-node
6m11s       Normal   Starting   node/master-node

*к этому моменту дебага я понимаю, что у меня проблемы, видимо, не только с Proxy, т.к. sudo journalctl -u kubelet -f показывает мне это.. *

июл 12 20:44:28 master-node kubelet[4899]: E0712 20:44:28.491882    4899 kuberuntime_container.go:822] "Kill container failed" err=<
июл 12 20:44:28 master-node kubelet[4899]:         rpc error: code = Unknown desc = failed to kill container "d34261e794c9355049541bb48030d02f25f86bffef3fdd006e2c194923e42ab4": unknown error after kill: runc did not terminate successfully: exit status 1: unable to signal init: permission denied
июл 12 20:44:28 master-node kubelet[4899]:         : unknown
июл 12 20:44:28 master-node kubelet[4899]:  > pod="kube-system/coredns-7db6d8ff4d-xqcd4" podUID="0c8a43fe-0f53-47dd-87e7-657fba332b02" containerName="coredns" containerID={"Type":"containerd","ID":"d34261e794c9355049541bb48030d02f25f86bffef3fdd006e2c194923e42ab4"}
июл 12 20:44:28 master-node kubelet[4899]: E0712 20:44:28.500125    4899 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err=<
июл 12 20:44:28 master-node kubelet[4899]:         rpc error: code = Unknown desc = failed to stop container "d34261e794c9355049541bb48030d02f25f86bffef3fdd006e2c194923e42ab4": failed to kill container "d34261e794c9355049541bb48030d02f25f86bffef3fdd006e2c194923e42ab4": unknown error after kill: runc did not terminate successfully: exit status 1: unable to signal init: permission denied
июл 12 20:44:28 master-node kubelet[4899]:         : unknown
июл 12 20:44:28 master-node kubelet[4899]:  > podSandboxID="17c02b47ffaa6b5ba19af13f1c28b4e7a9f1905151cac91b27aaa6bb3f29d902"
июл 12 20:44:28 master-node kubelet[4899]: E0712 20:44:28.500170    4899 kuberuntime_manager.go:1375] "Failed to stop sandbox" podSandboxID={"Type":"containerd","ID":"17c02b47ffaa6b5ba19af13f1c28b4e7a9f1905151cac91b27aaa6bb3f29d902"}
июл 12 20:44:28 master-node kubelet[4899]: E0712 20:44:28.500223    4899 kubelet.go:1878] "KillPod failed" err="[failed to \"KillContainer\" for \"coredns\" with KillContainerError: \"rpc error: code = Unknown desc = failed to kill container \\\"d34261e794c9355049541bb48030d02f25f86bffef3fdd006e2c194923e42ab4\\\": unknown error after kill: runc did not terminate successfully: exit status 1: unable to signal init: permission denied\\n: unknown\", failed to \"KillPodSandbox\" for \"0c8a43fe-0f53-47dd-87e7-657fba332b02\" with KillPodSandboxError: \"rpc error: code = Unknown desc = failed to stop container \\\"d34261e794c9355049541bb48030d02f25f86bffef3fdd006e2c194923e42ab4\\\": failed to kill container \\\"d34261e794c9355049541bb48030d02f25f86bffef3fdd006e2c194923e42ab4\\\": unknown error after kill: runc did not terminate successfully: exit status 1: unable to signal init: permission denied\\n: unknown\"]" pod="kube-system/coredns-7db6d8ff4d-xqcd4" podStatus={"ID":"0c8a43fe-0f53-47dd-87e7-657fba332b02","Name":"coredns-7db6d8ff4d-xqcd4","Namespace":"kube-system","IPs":["10.244.0.3"],"ContainerStatuses":[{"ID":"containerd://d34261e794c9355049541bb48030d02f25f86bffef3fdd006e2c194923e42ab4","Name":"coredns","State":"running","CreatedAt":"2024-07-12T09:09:51.518855869Z","StartedAt":"2024-07-12T09:09:51.572624527Z","FinishedAt":"0001-01-01T00:00:00Z","ExitCode":0,"Image":"registry.k8s.io/coredns/coredns:v1.11.1","ImageID":"registry.k8s.io/coredns/coredns@sha256:1eeb4c7316bacb1d4c8ead65571cd92dd21e27359f0d4917f1a5822a73b75db1","ImageRef":"registry.k8s.io/coredns/coredns@sha256:1eeb4c7316bacb1d4c8ead65571cd92dd21e27359f0d4917f1a5822a73b75db1","ImageRuntimeHandler":"","Hash":1241243534,"HashWithoutResources":0,"RestartCount":0,"Reason":"","Message":"","Resources":null}],"SandboxStatuses":[{"id":"17c02b47ffaa6b5ba19af13f1c28b4e7a9f1905151cac91b27aaa6bb3f29d902","metadata":{"name":"coredns-7db6d8ff4d-xqcd4","uid":"0c8a43fe-0f53-47dd-87e7-657fba332b02","namespace":"kube-system"},"created_at":1720775391353421158,"network":{"ip":"10.244.0.3"},"linux":{"namespaces":{"options":{"pid":1}}},"labels":{"io.kubernetes.pod.name":"coredns-7db6d8ff4d-xqcd4","io.kubernetes.pod.namespace":"kube-system","io.kubernetes.pod.uid":"0c8a43fe-0f53-47dd-87e7-657fba332b02","k8s-app":"kube-dns","pod-template-hash":"7db6d8ff4d"},"annotations":{"kubernetes.io/config.seen":"2024-07-12T09:09:51.002186977Z","kubernetes.io/config.source":"api"}}],"TimeStamp":"0001-01-01T00:00:00Z"}
июл 12 20:44:41 master-node kubelet[4899]: I0712 20:44:41.502964    4899 scope.go:117] "RemoveContainer" containerID="1780e314dd4ac9032ba1bc9b34e42576c2afd3d7e820c98c8e022ed759da83ac"
июл 12 20:44:41 master-node kubelet[4899]: E0712 20:44:41.503390    4899 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-proxy\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-ftns6_kube-system(60ac7788-9c3d-44f0-8cad-404e7097815d)\"" pod="kube-system/kube-proxy-ftns6" podUID="60ac7788-9c3d-44f0-8cad-404e7097815d"
июл 12 20:44:57 master-node kubelet[4899]: I0712 20:44:57.481459    4899 scope.go:117] "RemoveContainer" containerID="1780e314dd4ac9032ba1bc9b34e42576c2afd3d7e820c98c8e022ed759da83ac"

Строго не судите, я только осваиваю кубер. Все папки и файлы висят на root, вроде бы проблем с доступом быть не должно было быть..

ls -ld /var/lib/kubelet
drwxrwxr-x 9 root root 4096 июл 12 09:07 /var/lib/kubelet
ls -ld /var/lib/kube-proxy
drwxr-xr-x 2 root root 4096 июл 12 14:53 /var/lib/kube-proxy
ls -ld /run/xtables.lock
-rw-r--r-- 1 root root 0 июл 12 09:07 /run/xtables.lock
ls -ld /lib/modules
drwxr-xr-x 4 root root 4096 июл 11 11:46 /lib/modules
ls -l /lib/modules
total 8
drwxr-xr-x 5 root root 4096 июл  9 11:36 6.8.0-36-generic
drwxr-xr-x 5 root root 4096 июл 11 11:47 6.8.0-38-generic
ls -l /run/xtables.lock
-rw-r--r-- 1 root root 0 июл 12 09:07 /run/xtables.lock

Ответы (1 шт):

Автор решения: Константин Сингалов

Пришлось покопаться, но проблемы решены.

  1. Перезагружать поды не давал AppArmor.
  2. Перезагружаться подам хотелось из-за неверно выставленных ограничений ресурсов по умолчанию.
  3. Kube-proxy я так и не понял из-за чего перезагружался, но косвенно связано с Flannel. Поставил Calico через
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

и всё заработало (calico сделал свои 2 пода), 5 часов без единой перезагрузки и ошибки.

→ Ссылка