Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreDNS does not start after host reboot #4892

Open
etm-de opened this issue Feb 19, 2025 · 1 comment
Open

CoreDNS does not start after host reboot #4892

etm-de opened this issue Feb 19, 2025 · 1 comment

Comments

@etm-de
Copy link

etm-de commented Feb 19, 2025

Summary

I am running microk8s on an Ubuntu Amazon Workspace. I can get CoreDNS to start once, but if I reboot the machine, the CoreDNS pod no longer starts. I suspect this may be a peculiarity to Amazon Workspaces. I'm hoping you can give me tips to debug the problem.

Note that I have ha-cluster disabled in the examples below, but I have seen the same thing with it enabled.

$ microk8s kubectl -n kube-system get po
NAME                       READY   STATUS    RESTARTS      AGE
coredns-79b94494c7-qqqzn   0/1     Running   1 (28m ago)   33m

Here are the CoreDNS logs:

[INFO] 127.0.0.1:46655 - 23633 "HINFO IN 8185052869104737298.5565547917880098493. udp 57 false 512" - - 0 2.00032851s
[ERROR] plugin/errors: 2 8185052869104737298.5565547917880098493. HINFO: read udp 10.1.94.8:44798->172.31.254.165:53: i/o timeout
[INFO] 127.0.0.1:44082 - 18542 "HINFO IN 8185052869104737298.5565547917880098493. udp 57 false 512" - - 0 2.001442862s
[ERROR] plugin/errors: 2 8185052869104737298.5565547917880098493. HINFO: read udp 10.1.94.8:59424->172.31.253.215:53: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "https://10.152.183.1:443/version": dial tcp 10.152.183.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"

I also see

$ microk8s kubectl get svc -n kube-system
NAME       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.152.183.10   <none>        53/UDP,53/TCP,9153/TCP   5m57s

$ microk8s kubectl  get ep -n kube-system
NAME       ENDPOINTS   AGE
kube-dns               4m48s

I have tried

microk8s kubectl -n kube-system rollout restart deploy

and re-enabling the DNS plugin, but it doesn't resolve the issue. I also tried

sudo ufw allow in on vxlan.calico && sudo ufw allow out on vxlan.calico
sudo ufw allow in on cali+ && sudo ufw allow out on cali+

and

sudo ufw default allow routed

That did not seem to help.

What Should Happen Instead?

I should be able to restart my machine and have CoreDNS continue to run.

Reproduction Steps

  1. Install microk8s sudo snap install microk8s --classic
  2. Check that CoreDNS pod is running microk8s kubectl -n kube-system get po
  3. Reboot the machine
  4. CoreDNS pod is no longer running microk8s kubectl -n kube-system get po

Introspection Report

Inspecting system
Inspecting Certificates
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-kubelite is running
  Service snap.microk8s.daemon-flanneld is running
  Service snap.microk8s.daemon-etcd is running
  Service snap.microk8s.daemon-apiserver-kicker is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy openSSL information to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy asnycio usage and limits to the final report tarball
  Copy inotify max_user_instances and max_user_watches to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster

I did not attach the tarball because I do not know enough about the security implications.

Can you suggest a fix?

Are you interested in contributing with a fix?

@eaudetcobello
Copy link
Contributor

eaudetcobello commented Feb 21, 2025

Hi @etm-de,

I don't know too much about amazon workspaces unfortunately. Do they use AMIs? If so, are you deploying on an Ubuntu AMI or an Amazon Linux one? Can you look in journalctl -u snap.microk8s.daemon-kubelite if there's anything interesting?

I tried reproducing your problem using LXD containers but everything was fine after reboot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants