Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading to from Rancher 1.16.0 => 1.17.0 cannot pull docker images from internal docker repository when connected to company VPN #8055

Closed
jwhitmore-fleetresponse opened this issue Jan 8, 2025 · 9 comments
Assignees
Labels
area/networking kind/bug Something isn't working platform/windows regression Functionality was working in a previous release and is now broken runtime/moby
Milestone

Comments

@jwhitmore-fleetresponse
Copy link

Actual Behavior

Upgrading to Rancher Desktop 1.17.0 my team and I get the following error when trying to connect to our internal docker repository
Error response from daemon: Get https://ourlocalrepositoryname/v2/%22: dial tcp: lookup ourlocalrepositoryname on 192.168.127.1:53: no such host
Login failed.

Steps to Reproduce

Upgrade to Rancher 1.17.0 from 1.16.0

Result

Upgrading to Rancher Desktop 1.17.0 my team and I get the following error when trying to connect to our internal docker repository
Error response from daemon: Get https://ourlocalrepositoryname/v2/%22: dial tcp: lookup ourlocalrepositoryname on 192.168.127.1:53: no such host
Login failed.

Expected Behavior

Be able to connect to repository just like in version 1.16.0

Additional Information

When my team and I downgrade to 1.16.0 we no longer have the issues and everything works fine

Rancher Desktop Version

1.17.0

Rancher Desktop K8s Version

1.31.4

Which container engine are you using?

moby (docker cli)

What operating system are you using?

Windows

Operating System / Build Version

Windows 11 Enterprise v 10.0.26100

What CPU architecture are you using?

x64

Linux only: what package format did you use to install Rancher Desktop?

N/A

Windows User Only

using OpenVPN v2.6.12

@jwhitmore-fleetresponse jwhitmore-fleetresponse added the kind/bug Something isn't working label Jan 8, 2025
@jwhitmore-fleetresponse jwhitmore-fleetresponse changed the title Upgrading to from Rancher 1.16.0 => 1.17.0 cannot pull docker images from internal when connected to company VPN Upgrading to from Rancher 1.16.0 => 1.17.0 cannot pull docker images from internal docker repository when connected to company VPN Jan 8, 2025
@jandubois jandubois added the regression Functionality was working in a previous release and is now broken label Jan 9, 2025
@Nino-K
Copy link
Member

Nino-K commented Jan 9, 2025

@jwhitmore-fleetresponse thanks for reporting this. Are you able to attach the logs from the guestAgent?

@evilhamsterman
Copy link

I think the issue is more than just not working with a VPN. The DNS seems really broken, it's broken in different ways depending on whether you have dnsTunneling=true and whether networkMode=mirrored or not.

In trying to find a way to get DNS to work properly I basically tried every configuration of

  • dnsTunneling true || false
  • networkMode true || false
  • Whether the rancher-desktop WSL instance has the default nameserver 192.168.127.1 or the nameserver for the tunnelling mode

Ultimately I didn't not find a combination where all the DNS functionality works. Sometimes it can pull containers, sometimes not. Sometimes the container will be able to resolve Internet addresses even if Docker can't pull a container, but it can't resolve VPN addresses. Sometime the Rancher Desktop WSL instance can only resolve DNS on the Internet, sometimes the internet and the VPN, sometime nothing at all.

The biggest consistency is that I could never get a container to be able to resolve a DNS on the VPN, even if the underlying Rancher Desktop instance can.

I documented everything and grabbed logs from each combination if you'd like me to add them.

@jandubois jandubois added this to the 1.17.1 milestone Jan 13, 2025
@luca-ballerini
Copy link

We are seeing the same issue on macOS when upgrading from 1.16.0 to 1.17.0

@Nino-K
Copy link
Member

Nino-K commented Jan 14, 2025

@evilhamsterman could kindly attach your logs? Many thanks

@isalminen
Copy link

We are facing this same problem. Likely related to this Lima issue: lima-vm/lima#3101
I have these message in lima.ha.stderr.log
{"level":"debug","msg":"Error during DNS Exchange: dial udp: missing address","time":"2025-01-15T16:55:59+02:00"}

@Nino-K
Copy link
Member

Nino-K commented Jan 15, 2025

I can confirm that the issue is caused by the recent version bump of the gvisor-tap-vsock library to v0.8.1 as part of v1.17.0 release. To resolve this, we will need to downgrade it back to v0.7.5 and prepare a patch release.

Thank you for bringing this to our attention. I'll keep you updated with any further developments.

@Anutrix
Copy link

Anutrix commented Jan 20, 2025

Fix seems to be in gvisor-tap-vsock 0.8.2 now(containers/gvisor-tap-vsock#450).
Maybe someone can test it out.

@jandubois
Copy link
Member

Fix seems to be in gvisor-tap-vsock 0.8.2 now(containers/gvisor-tap-vsock#450).
Maybe someone can test it out.

Unfortunately that release seems to only fix the issues on macOS and Linux, but not the Windows problems.

In order to get 1.17.1 out we'll have to downgrade to 0.7.5 on Windows (but will upgrade to 0.8.2 on macOS and Linux). Hopefully the root cause on Windows can be found and fixed in time for 1.18.0.

@Nino-K
Copy link
Member

Nino-K commented Jan 21, 2025

We are planning to release patch version 1.17.1 to address this issue. The root cause was an upgrade of gvisor-tap-vsock to v0.8.1. As part of the fix, we have downgraded it to v0.7.5, which should resolve the problem.

@Nino-K Nino-K closed this as completed Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking kind/bug Something isn't working platform/windows regression Functionality was working in a previous release and is now broken runtime/moby
Projects
None yet
Development

No branches or pull requests

7 participants