Hi all,
- freshly installed Leap 15.6, “server”, in a VM. (If relevant: Currently VMWare Workstation, VM running in NAT mode. But this is only for testing)
- hostname set to myhost.localdomain in /etc/hosts
- entry in /etc/hosts myhost.localdomain myhost
- absolutely NO firewall(d), no restricting iptables rules, nothing, installation defaults of Leap 15.6
- net.ipv4.ip_forward =1, found a hint, but seems to be Leap 15.6 already
→ Installed a as basic and as simple as possible Kubernetes Cluster ←
→ Tried with 2 test pods if they can ping / iperf3 each other, Result: NO ←
(simple test script with output below)
I’m strongly sure it has something to do with the base setup of the host system, Leap or maybe VMware (or whatever virtualization) settings. But running out of ideas…
E.g. tried with RKE(1), and this basic cluster.yml
nodes:
- address: 192.168.46.130
user: docker
role:
- controlplane
- etcd
- worker
ssh_key_path: ~/.ssh/id_ed25519
ssh_agent_auth: true
services:
kube-api: {}
kube-controller: {}
scheduler: {}
kubelet:
fail_swap_on: false
kubeproxy: {}
network:
plugin: flannel
Always getting logged c"luster fine, all and everything running, have fun". never any error, no warnings. Doesn’t matter if K8s cluster as basic RKE(1) as shown, K3s pure default install, or as simple as possible RKE2:
→ ALWAYS same problem.
Tried as shown with flannel, also with calico, also canal. With flannel: VXLAN and host-gw. With VXLAN all MTU settings checked and tried to adjust.
→ ALWAYS same problem.
My test script:
+ kubectl run test-pod-1 --image=networkstatic/iperf3 --restart=Never -- iperf3 -s
pod/test-pod-1 created
+ kubectl wait --for=condition=Ready pod/test-pod-1 --timeout=60s
pod/test-pod-1 condition met
+ kubectl run test-pod-2 --image=networkstatic/iperf3 --restart=Never -- iperf3 -s
pod/test-pod-2 created
+ kubectl wait --for=condition=Ready pod/test-pod-2 --timeout=60s
pod/test-pod-2 condition met
+ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-pod-1 1/1 Running 0 9s 10.42.0.15 rke-suse <none> <none>
test-pod-2 1/1 Running 0 2s 10.42.0.16 rke-suse <none> <none>
++ kubectl get pod test-pod-1 -o 'jsonpath={.status.podIP}'
+ kubectl exec test-pod-2 -- iperf3 -c 10.42.0.15
iperf3: error - unable to connect to server: No route to host
command terminated with exit code 1
Please believe me that I’ve digged into “ip route”, on VM, inside pod, checked all flannel etc., CNI and coreDNS logs - EVERYTHING looks as it should. Esp. as I have an old centOS 7 VM with an old RKE but very similar cluster available - all routing, CIDR etc. settings are the same (defaults).
The network overlay etc. routing from the host perspective works fine, e.g. ping 10.42.0.<15|16 > from the host to pod works fine.
If I try similar with docker instead of K8s
docker network create mybridge
(run the iperf3 script with "docker run".. instead of "kubectl run")
no issue, fine.
But of course, other CNI, no flannel/calico/…
Ah, and external traffic to and from the pods is routed fine, it’s only pod to pod.
ANY HINTS WOULD BE GREAT, as I have no ideas anymore… and this issue drives me mad after spending hours and hours
Regards,
Michael