Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS discovery fails when RELEASE_NODE is FQDN #14

Open
j4nk3e opened this issue Jan 24, 2025 · 4 comments
Open

DNS discovery fails when RELEASE_NODE is FQDN #14

j4nk3e opened this issue Jan 24, 2025 · 4 comments

Comments

@j4nk3e
Copy link

j4nk3e commented Jan 24, 2025

I tried getting dns_cluster working with FQDNs, however the discovery always fails when I set a FQDN in the RELEASE_NODE env with the error message Cannot get connection id for node app_name@prod-01.<redacted>.com coming from erlang. When I call Node.connect :"app_name@prod-01.<redacted>.com" manually, it connects and everything works fine.
I have an A record for <redacted>.com with the IP addresses for prod-01 and prod-02, the two machines which connect together, and they have full TCP access to each other. The hostname of both machines is set to their own FQDN, and I created an A record for each of those, too.

The documentation clearly states:

export RELEASE_NODE="myapp@fully-qualified-host-or-ip"

so I assume, a FQDN as node name is supported by dns_cluster.

Replacing the FQDN in RELEASE_NODE with the public IP address, the dns discovery starts working. However I would prefer to use domain names in the config to make the setup more robust in case a machine gets a new IP address.

Am I doing something wrong, or might this be an issue somewhere in the dns_cluster lib? Unfortunately I can't figure out where the Cannot get connection id for node error is coming from.

@josevalim
Copy link
Member

Can you resolve the DNS if you run Elixir on that node and simulate the code in this library? The DNS queries it runs are small, so you can try reproducing it.

@j4nk3e
Copy link
Author

j4nk3e commented Jan 24, 2025

Yes, I tried running :inet_res.getbyname("<redacted>.com" |> String.to_charlist, :a) and got back the IPs of both machines, as configured in the DNS entry for my cluster.
edit: I'll try running everything line by line again, I couldn't reproduce the error yet.

@j4nk3e
Copy link
Author

j4nk3e commented Jan 24, 2025

It seems that Node.connect works fine with the hostname, but fails when trying to connect using an IP address instead.
dns_cluster always tries to connect using the IPs which it got from the resolver.lookup call.
If I understand the problem correctly, we either have to find out the hostname for the other machine (we only have the IP returned by the DNS), or convince Erlang to allow connecting to a node by IP which has the RELEASE_NODE set to a hostname.

@josevalim
Copy link
Member

The node and the name has to match. So if you name your machine --name foo@hostname, then you need to connect using --name foo@hostname. If you name it using an IP, then you connect using the IP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants