Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: find node by uuid #142

Merged
merged 1 commit into from
Sep 14, 2024
Merged

feat: find node by uuid #142

merged 1 commit into from
Sep 14, 2024

Conversation

sergelogvinov
Copy link
Owner

@sergelogvinov sergelogvinov commented Sep 13, 2024

In some setups, the Proxmox VM name may differ from the Linux hostname. To reliably identify a VM within a Proxmox cluster, we can use the system's UUID

refs #140

Pull Request

What? (description)

Why? (reasoning)

Acceptance

Please use the following checklist:

  • you linked an issue (if applicable)
  • you included tests (if applicable)
  • you linted your code (make lint)
  • you linted your code (make unit)

See make help for a description of the available targets.

@sergelogvinov
Copy link
Owner Author

Hi, @fibbs can you check this PR.

I've run test in my lab already. It can find the VM by UUID. Can you check the rancher setup...
You can build you own image, or use this one - ghcr.io/sergelogvinov/proxmox-cloud-controller-manager:uuid

git clone ...
make images USERNAME=$GITHUB-ACC PLATFORM=linux/amd64 TAG=uuid PUSH=true

In some setups, the Proxmox VM name may differ from the Linux hostname.
To reliably identify a VM within a Proxmox cluster, we can use the system's UUID

Signed-off-by: Serge Logvinov <[email protected]>
@sergelogvinov sergelogvinov merged commit 5876cd4 into main Sep 14, 2024
3 checks passed
@sergelogvinov
Copy link
Owner Author

I've run some test, and i hope it solves the rancher case.

You can check the edge version now

@sergelogvinov sergelogvinov deleted the uuid branch September 14, 2024 08:40
@fibbs
Copy link

fibbs commented Sep 15, 2024

I will test this within the next days, now that I have got the variant running that has node-name and vm name equal. Very much appreciated that you consider to support such a more dynamic configuration.

@fibbs
Copy link

fibbs commented Sep 16, 2024

I did successfully test now:

Installed cluster in the same way as described in the other issue, but with the difference that my three nodes are called vm-small-07, vm-small-08 and vm-small-09 in Proxmox, but the Kubernetes node names are as follows:

3f8afec0-51bc-48d3-88b0-28991082f7b7   Ready    control-plane,etcd,master,worker   115s   v1.30.4+rke2r1
5fabaa96-359b-4ce9-bf8c-853012466b7f   Ready    control-plane,etcd,master,worker   10m    v1.30.4+rke2r1
d5b00005-10ee-4981-97c3-2ed22b60f507   Ready    control-plane,etcd,master,worker   80s    v1.30.4+rke2r1

So, there is no way to get the reference to the VMID by matching the K8S node name to the VM name.

I have used the following values.yaml to install the CCM on the cluster:

existingConfigSecret: proxmox-cloud-controller-manager
# -- Proxmox cluster config stored in secrets key.
existingConfigSecretKey: config.yaml
logVerbosityLevel: 2
image:
  tag: edge
  pullPolicy: Always

CCM starts and does its thing:

I0916 14:20:37.794805       1 reflector.go:368] Caches populated for *v1.ConfigMap from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:243
I0916 14:20:37.795102       1 reflector.go:368] Caches populated for *v1.ConfigMap from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:243
I0916 14:20:37.795340       1 reflector.go:368] Caches populated for *v1.ConfigMap from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:243
I0916 14:20:37.803018       1 leaderelection.go:268] successfully acquired lease kube-system/cloud-controller-manager-proxmox
I0916 14:20:37.803766       1 event.go:389] "Event occurred" object="kube-system/cloud-controller-manager-proxmox" fieldPath="" kind="Lease" apiVersion="coordination.k8s.io/v1" type="Normal" reason="LeaderElection" message="proxmox-cloud-controller-manager-674cbd8684-v2j74_5c307636-fff3-493b-9254-9a87f672953b became leader"
I0916 14:20:37.813305       1 cloud.go:82] "clientset initialized"
I0916 14:20:37.818946       1 cloud.go:101] "proxmox initialized"
W0916 14:20:37.818968       1 controllermanager.go:306] "service-lb-controller" is disabled
W0916 14:20:37.818976       1 controllermanager.go:306] "node-route-controller" is disabled
I0916 14:20:37.818982       1 controllermanager.go:310] Starting "cloud-node-controller"
I0916 14:20:37.820425       1 controllermanager.go:329] Started "cloud-node-controller"
I0916 14:20:37.820441       1 controllermanager.go:310] Starting "cloud-node-lifecycle-controller"
I0916 14:20:37.820591       1 node_controller.go:176] Sending events to api server.
I0916 14:20:37.820657       1 node_controller.go:185] Waiting for informer caches to sync
I0916 14:20:37.821796       1 controllermanager.go:329] Started "cloud-node-lifecycle-controller"
I0916 14:20:37.821941       1 node_lifecycle_controller.go:112] Sending events to api server
I0916 14:20:37.823806       1 reflector.go:368] Caches populated for *v1.Node from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:243
I0916 14:20:37.893712       1 shared_informer.go:320] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0916 14:20:37.893769       1 shared_informer.go:320] Caches are synced for RequestHeaderAuthRequestController
I0916 14:20:37.893790       1 shared_informer.go:320] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0916 14:20:37.894197       1 tlsconfig.go:181] "Loaded client CA" index=0 certName="client-ca::kube-system::extension-apiserver-authentication::client-ca-file,client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" certDetail="\"rke2-request-header-ca@1726496043\" [] issuer=\"<self>\" (2024-09-16 14:14:03 +0000 UTC to 2034-09-14 14:14:03 +0000 UTC (now=2024-09-16 14:20:37.894179842 +0000 UTC))"
I0916 14:20:37.894363       1 tlsconfig.go:203] "Loaded serving cert" certName="Generated self signed cert" certDetail="\"localhost@1726496436\" [serving] validServingFor=[127.0.0.1,localhost,localhost] issuer=\"localhost-ca@1726496435\" (2024-09-16 13:20:35 +0000 UTC to 2025-09-16 13:20:35 +0000 UTC (now=2024-09-16 14:20:37.894353651 +0000 UTC))"
I0916 14:20:37.894499       1 named_certificates.go:53] "Loaded SNI cert" index=0 certName="self-signed loopback" certDetail="\"apiserver-loopback-client@1726496437\" [serving] validServingFor=[apiserver-loopback-client] issuer=\"apiserver-loopback-client-ca@1726496437\" (2024-09-16 13:20:37 +0000 UTC to 2025-09-16 13:20:37 +0000 UTC (now=2024-09-16 14:20:37.894490091 +0000 UTC))"
I0916 14:20:37.894760       1 tlsconfig.go:181] "Loaded client CA" index=0 certName="client-ca::kube-system::extension-apiserver-authentication::client-ca-file,client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" certDetail="\"rke2-client-ca@1726496042\" [] issuer=\"<self>\" (2024-09-16 14:14:02 +0000 UTC to 2034-09-14 14:14:02 +0000 UTC (now=2024-09-16 14:20:37.894735757 +0000 UTC))"
I0916 14:20:37.894850       1 tlsconfig.go:181] "Loaded client CA" index=1 certName="client-ca::kube-system::extension-apiserver-authentication::client-ca-file,client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" certDetail="\"rke2-request-header-ca@1726496043\" [] issuer=\"<self>\" (2024-09-16 14:14:03 +0000 UTC to 2034-09-14 14:14:03 +0000 UTC (now=2024-09-16 14:20:37.894843914 +0000 UTC))"
I0916 14:20:37.895054       1 tlsconfig.go:203] "Loaded serving cert" certName="Generated self signed cert" certDetail="\"localhost@1726496436\" [serving] validServingFor=[127.0.0.1,localhost,localhost] issuer=\"localhost-ca@1726496435\" (2024-09-16 13:20:35 +0000 UTC to 2025-09-16 13:20:35 +0000 UTC (now=2024-09-16 14:20:37.895045193 +0000 UTC))"
I0916 14:20:37.895523       1 named_certificates.go:53] "Loaded SNI cert" index=0 certName="self-signed loopback" certDetail="\"apiserver-loopback-client@1726496437\" [serving] validServingFor=[apiserver-loopback-client] issuer=\"apiserver-loopback-client-ca@1726496437\" (2024-09-16 13:20:37 +0000 UTC to 2025-09-16 13:20:37 +0000 UTC (now=2024-09-16 14:20:37.89551211 +0000 UTC))"
I0916 14:20:37.920847       1 node_controller.go:429] Initializing node 5fabaa96-359b-4ce9-bf8c-853012466b7f with cloud provider
I0916 14:20:37.920972       1 node_controller.go:271] Update 1 nodes status took 71.629µs.
I0916 14:20:37.937513       1 node_controller.go:512] Adding node label from cloud provider: beta.kubernetes.io/instance-type=4VCPU-8GB
I0916 14:20:37.937528       1 node_controller.go:513] Adding node label from cloud provider: node.kubernetes.io/instance-type=4VCPU-8GB
I0916 14:20:37.937535       1 node_controller.go:524] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/zone=pve1
I0916 14:20:37.937541       1 node_controller.go:525] Adding node label from cloud provider: topology.kubernetes.io/zone=pve1
I0916 14:20:37.937548       1 node_controller.go:535] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/region=inqbeo-ca-home
I0916 14:20:37.937555       1 node_controller.go:536] Adding node label from cloud provider: topology.kubernetes.io/region=inqbeo-ca-home
I0916 14:20:37.952449       1 node_controller.go:474] Successfully initialized node 5fabaa96-359b-4ce9-bf8c-853012466b7f with cloud provider
I0916 14:20:37.952666       1 event.go:389] "Event occurred" object="5fabaa96-359b-4ce9-bf8c-853012466b7f" fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="Synced" message="Node synced successfully"

First node correctly gets taint removed, labels and providerID set:

k get node 5fabaa96-359b-4ce9-bf8c-853012466b7f -o yaml | grep -A 10 labels:
  labels:
    CPUModel: QEMU-Virtual-CPU-version-2-5
    CPUVendor: GenuineIntel
    CPUVendorTotalCPUCores: "4"
    VMFamily: small
    VMHostName: rancher-25753
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/instance-type: 4VCPU-8GB
    beta.kubernetes.io/os: linux
    failure-domain.beta.kubernetes.io/region: inqbeo-ca-home
    failure-domain.beta.kubernetes.io/zone: pve1

After this first node being all up and running, Rancher continues to set the other nodes up, which also get their labels etc. correctly:

k get node -l topology.kubernetes.io/zone=pve1,topology.kubernetes.io/region=inqbeo-ca-home
NAME                                   STATUS   ROLES                              AGE     VERSION
3f8afec0-51bc-48d3-88b0-28991082f7b7   Ready    control-plane,etcd,master,worker   3m59s   v1.30.4+rke2r1
5fabaa96-359b-4ce9-bf8c-853012466b7f   Ready    control-plane,etcd,master,worker   12m     v1.30.4+rke2r1
d5b00005-10ee-4981-97c3-2ed22b60f507   Ready    control-plane,etcd,master,worker   3m24s   v1.30.4+rke2r1

All fine! Great stuff!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants