Kubernetes - ошибка и перезагрузка proxy каждые 4 минуты
Установил Kubernetes + Docker по стандартной инструкции (Инструкция отсюда).
Ubuntu 24.04 Docker 24.0.7 Kuberct 1.30.2
Всё хорошо, все поды вроде бы запустились. Но спустя пару минут под kube-proxi уходит в перезагрузку.
kubectl get pods -n kube-system (кроме этого пода всё ок)
NAME READY STATUS RESTARTS AGE
coredns-7db6d8ff4d-6k7mj 1/1 Running 0 10h
coredns-7db6d8ff4d-xqcd4 1/1 Running 0 10h
etcd-master-node 1/1 Running 1 (12h ago) 72m
kube-apiserver-master-node 1/1 Running 1 (12h ago) 10h
kube-controller-manager-master-node 1/1 Running 1 (12h ago) 10h
kube-proxy-ftns6 0/1 CrashLoopBackOff 132 (4m21s ago) 10h
kube-scheduler-master-node 1/1 Running 1 (12h ago) 10h
kubectl describe pod kube-proxy-ftns6 -n kube-system когда под запущен
Name: kube-proxy-ftns6
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Service Account: kube-proxy
Node: master-node/192.168.0.104
Start Time: Fri, 12 Jul 2024 18:53:30 +0000
Labels: controller-revision-hash=669fc44fbc
k8s-app=kube-proxy
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 192.168.0.104
IPs:
IP: 192.168.0.104
Controlled By: DaemonSet/kube-proxy
Containers:
kube-proxy:
Container ID: containerd://eefa3b9daa97e025be84e7af7a75ac6d801b1b0cfccc8dd9976477ab6ac41593
Image: registry.k8s.io/kube-proxy:v1.30.2
Image ID: registry.k8s.io/kube-proxy@sha256:8a44c6e094af3dea3de57fa967e201608a358a3bd8b4e3f31ab905bbe4108aec
Port: <none>
Host Port: <none>
Command:
/usr/local/bin/kube-proxy
--config=/var/lib/kube-proxy/config.conf
--hostname-override=$(NODE_NAME)
State: Running
Started: Fri, 12 Jul 2024 20:16:15 +0000
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 12 Jul 2024 20:10:01 +0000
Finished: Fri, 12 Jul 2024 20:11:07 +0000
Ready: True
Restart Count: 17
Environment:
NODE_NAME: (v1:spec.nodeName)
Mounts:
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/var/lib/kube-proxy from kube-proxy (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xrbwk (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-proxy:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-proxy
Optional: false
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
kube-api-access-xrbwk:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Killing 17m (x15 over 82m) kubelet Stopping container kube-proxy
Warning BackOff 3m32s (x294 over 80m) kubelet Back-off restarting failed container kube-proxy in pod kube-proxy-ftns6_kube-system(60ac7788-9c3d-44f0-8cad-404e7097815d)
kubectl describe pod kube-proxy-ftns6 -n kube-system когда под упал
Name: kube-proxy-ftns6
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Service Account: kube-proxy
Node: master-node/192.168.0.104
Start Time: Fri, 12 Jul 2024 18:53:30 +0000
Labels: controller-revision-hash=669fc44fbc
k8s-app=kube-proxy
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 192.168.0.104
IPs:
IP: 192.168.0.104
Controlled By: DaemonSet/kube-proxy
Containers:
kube-proxy:
Container ID: containerd://eefa3b9daa97e025be84e7af7a75ac6d801b1b0cfccc8dd9976477ab6ac41593
Image: registry.k8s.io/kube-proxy:v1.30.2
Image ID: registry.k8s.io/kube-proxy@sha256:8a44c6e094af3dea3de57fa967e201608a358a3bd8b4e3f31ab905bbe4108aec
Port: <none>
Host Port: <none>
Command:
/usr/local/bin/kube-proxy
--config=/var/lib/kube-proxy/config.conf
--hostname-override=$(NODE_NAME)
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 12 Jul 2024 20:16:15 +0000
Finished: Fri, 12 Jul 2024 20:17:35 +0000
Ready: False
Restart Count: 17
Environment:
NODE_NAME: (v1:spec.nodeName)
Mounts:
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/var/lib/kube-proxy from kube-proxy (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xrbwk (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-proxy:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-proxy
Optional: false
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
kube-api-access-xrbwk:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Killing 19m (x15 over 84m) kubelet Stopping container kube-proxy
Warning BackOff 38s (x311 over 82m) kubelet Back-off restarting failed container kube-proxy in pod kube-proxy-ftns6_kube-system(60ac7788-9c3d-44f0-8cad-404e7097815d)
kubectl logs kube-proxy-ftns6
I0712 20:40:36.633174 1 server_linux.go:69] "Using iptables proxy"
I0712 20:40:36.645874 1 server.go:1062] "Successfully retrieved node IP(s)" IPs=["192.168.0.104"]
I0712 20:40:36.658069 1 conntrack.go:59] "Setting nf_conntrack_max" nfConntrackMax=917504
I0712 20:40:36.685069 1 server.go:659] "kube-proxy running in dual-stack mode" primary ipFamily="IPv4"
I0712 20:40:36.685100 1 server_linux.go:165] "Using iptables Proxier"
I0712 20:40:36.688152 1 server_linux.go:511] "Detect-local-mode set to ClusterCIDR, but no cluster CIDR for family" ipFamily="IPv6"
I0712 20:40:36.688172 1 server_linux.go:528] "Defaulting to no-op detect-local"
I0712 20:40:36.688204 1 proxier.go:243] "Setting route_localnet=1 to allow node-ports on localhost; to change this either disable iptables.localhostNodePorts (--iptables-localhost-nodeports) or set nodePortAddresses (--nodeport-addresses) to filter loopback addresses"
I0712 20:40:36.688443 1 server.go:872] "Version info" version="v1.30.2"
I0712 20:40:36.688471 1 server.go:874] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0712 20:40:36.689749 1 config.go:101] "Starting endpoint slice config controller"
I0712 20:40:36.689799 1 shared_informer.go:313] Waiting for caches to sync for endpoint slice config
I0712 20:40:36.689870 1 config.go:192] "Starting service config controller"
I0712 20:40:36.689918 1 shared_informer.go:313] Waiting for caches to sync for service config
I0712 20:40:36.689966 1 config.go:319] "Starting node config controller"
I0712 20:40:36.690032 1 shared_informer.go:313] Waiting for caches to sync for node config
I0712 20:40:36.789984 1 shared_informer.go:320] Caches are synced for endpoint slice config
I0712 20:40:36.790056 1 shared_informer.go:320] Caches are synced for service config
I0712 20:40:36.790149 1 shared_informer.go:320] Caches are synced for node config
kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
57m Normal Starting node/master-node
50m Normal Starting node/master-node
44m Normal Starting node/master-node
38m Normal Starting node/master-node
31m Normal Starting node/master-node
25m Normal Starting node/master-node
19m Normal Starting node/master-node
12m Normal Starting node/master-node
6m11s Normal Starting node/master-node
*к этому моменту дебага я понимаю, что у меня проблемы, видимо, не только с Proxy, т.к. sudo journalctl -u kubelet -f показывает мне это.. *
июл 12 20:44:28 master-node kubelet[4899]: E0712 20:44:28.491882 4899 kuberuntime_container.go:822] "Kill container failed" err=<
июл 12 20:44:28 master-node kubelet[4899]: rpc error: code = Unknown desc = failed to kill container "d34261e794c9355049541bb48030d02f25f86bffef3fdd006e2c194923e42ab4": unknown error after kill: runc did not terminate successfully: exit status 1: unable to signal init: permission denied
июл 12 20:44:28 master-node kubelet[4899]: : unknown
июл 12 20:44:28 master-node kubelet[4899]: > pod="kube-system/coredns-7db6d8ff4d-xqcd4" podUID="0c8a43fe-0f53-47dd-87e7-657fba332b02" containerName="coredns" containerID={"Type":"containerd","ID":"d34261e794c9355049541bb48030d02f25f86bffef3fdd006e2c194923e42ab4"}
июл 12 20:44:28 master-node kubelet[4899]: E0712 20:44:28.500125 4899 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err=<
июл 12 20:44:28 master-node kubelet[4899]: rpc error: code = Unknown desc = failed to stop container "d34261e794c9355049541bb48030d02f25f86bffef3fdd006e2c194923e42ab4": failed to kill container "d34261e794c9355049541bb48030d02f25f86bffef3fdd006e2c194923e42ab4": unknown error after kill: runc did not terminate successfully: exit status 1: unable to signal init: permission denied
июл 12 20:44:28 master-node kubelet[4899]: : unknown
июл 12 20:44:28 master-node kubelet[4899]: > podSandboxID="17c02b47ffaa6b5ba19af13f1c28b4e7a9f1905151cac91b27aaa6bb3f29d902"
июл 12 20:44:28 master-node kubelet[4899]: E0712 20:44:28.500170 4899 kuberuntime_manager.go:1375] "Failed to stop sandbox" podSandboxID={"Type":"containerd","ID":"17c02b47ffaa6b5ba19af13f1c28b4e7a9f1905151cac91b27aaa6bb3f29d902"}
июл 12 20:44:28 master-node kubelet[4899]: E0712 20:44:28.500223 4899 kubelet.go:1878] "KillPod failed" err="[failed to \"KillContainer\" for \"coredns\" with KillContainerError: \"rpc error: code = Unknown desc = failed to kill container \\\"d34261e794c9355049541bb48030d02f25f86bffef3fdd006e2c194923e42ab4\\\": unknown error after kill: runc did not terminate successfully: exit status 1: unable to signal init: permission denied\\n: unknown\", failed to \"KillPodSandbox\" for \"0c8a43fe-0f53-47dd-87e7-657fba332b02\" with KillPodSandboxError: \"rpc error: code = Unknown desc = failed to stop container \\\"d34261e794c9355049541bb48030d02f25f86bffef3fdd006e2c194923e42ab4\\\": failed to kill container \\\"d34261e794c9355049541bb48030d02f25f86bffef3fdd006e2c194923e42ab4\\\": unknown error after kill: runc did not terminate successfully: exit status 1: unable to signal init: permission denied\\n: unknown\"]" pod="kube-system/coredns-7db6d8ff4d-xqcd4" podStatus={"ID":"0c8a43fe-0f53-47dd-87e7-657fba332b02","Name":"coredns-7db6d8ff4d-xqcd4","Namespace":"kube-system","IPs":["10.244.0.3"],"ContainerStatuses":[{"ID":"containerd://d34261e794c9355049541bb48030d02f25f86bffef3fdd006e2c194923e42ab4","Name":"coredns","State":"running","CreatedAt":"2024-07-12T09:09:51.518855869Z","StartedAt":"2024-07-12T09:09:51.572624527Z","FinishedAt":"0001-01-01T00:00:00Z","ExitCode":0,"Image":"registry.k8s.io/coredns/coredns:v1.11.1","ImageID":"registry.k8s.io/coredns/coredns@sha256:1eeb4c7316bacb1d4c8ead65571cd92dd21e27359f0d4917f1a5822a73b75db1","ImageRef":"registry.k8s.io/coredns/coredns@sha256:1eeb4c7316bacb1d4c8ead65571cd92dd21e27359f0d4917f1a5822a73b75db1","ImageRuntimeHandler":"","Hash":1241243534,"HashWithoutResources":0,"RestartCount":0,"Reason":"","Message":"","Resources":null}],"SandboxStatuses":[{"id":"17c02b47ffaa6b5ba19af13f1c28b4e7a9f1905151cac91b27aaa6bb3f29d902","metadata":{"name":"coredns-7db6d8ff4d-xqcd4","uid":"0c8a43fe-0f53-47dd-87e7-657fba332b02","namespace":"kube-system"},"created_at":1720775391353421158,"network":{"ip":"10.244.0.3"},"linux":{"namespaces":{"options":{"pid":1}}},"labels":{"io.kubernetes.pod.name":"coredns-7db6d8ff4d-xqcd4","io.kubernetes.pod.namespace":"kube-system","io.kubernetes.pod.uid":"0c8a43fe-0f53-47dd-87e7-657fba332b02","k8s-app":"kube-dns","pod-template-hash":"7db6d8ff4d"},"annotations":{"kubernetes.io/config.seen":"2024-07-12T09:09:51.002186977Z","kubernetes.io/config.source":"api"}}],"TimeStamp":"0001-01-01T00:00:00Z"}
июл 12 20:44:41 master-node kubelet[4899]: I0712 20:44:41.502964 4899 scope.go:117] "RemoveContainer" containerID="1780e314dd4ac9032ba1bc9b34e42576c2afd3d7e820c98c8e022ed759da83ac"
июл 12 20:44:41 master-node kubelet[4899]: E0712 20:44:41.503390 4899 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-proxy\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-ftns6_kube-system(60ac7788-9c3d-44f0-8cad-404e7097815d)\"" pod="kube-system/kube-proxy-ftns6" podUID="60ac7788-9c3d-44f0-8cad-404e7097815d"
июл 12 20:44:57 master-node kubelet[4899]: I0712 20:44:57.481459 4899 scope.go:117] "RemoveContainer" containerID="1780e314dd4ac9032ba1bc9b34e42576c2afd3d7e820c98c8e022ed759da83ac"
Строго не судите, я только осваиваю кубер. Все папки и файлы висят на root, вроде бы проблем с доступом быть не должно было быть..
ls -ld /var/lib/kubelet
drwxrwxr-x 9 root root 4096 июл 12 09:07 /var/lib/kubelet
ls -ld /var/lib/kube-proxy
drwxr-xr-x 2 root root 4096 июл 12 14:53 /var/lib/kube-proxy
ls -ld /run/xtables.lock
-rw-r--r-- 1 root root 0 июл 12 09:07 /run/xtables.lock
ls -ld /lib/modules
drwxr-xr-x 4 root root 4096 июл 11 11:46 /lib/modules
ls -l /lib/modules
total 8
drwxr-xr-x 5 root root 4096 июл 9 11:36 6.8.0-36-generic
drwxr-xr-x 5 root root 4096 июл 11 11:47 6.8.0-38-generic
ls -l /run/xtables.lock
-rw-r--r-- 1 root root 0 июл 12 09:07 /run/xtables.lock
Ответы (1 шт):
Пришлось покопаться, но проблемы решены.
- Перезагружать поды не давал AppArmor.
- Перезагружаться подам хотелось из-за неверно выставленных ограничений ресурсов по умолчанию.
- Kube-proxy я так и не понял из-за чего перезагружался, но косвенно связано с Flannel. Поставил Calico через
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
и всё заработало (calico сделал свои 2 пода), 5 часов без единой перезагрузки и ошибки.