Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SURE-8882] extend our testing to ssh downloads with keys #2751

Open
kkaempf opened this issue Aug 16, 2024 · 3 comments
Open

[SURE-8882] extend our testing to ssh downloads with keys #2751

kkaempf opened this issue Aug 16, 2024 · 3 comments

Comments

@kkaempf
Copy link
Collaborator

kkaempf commented Aug 16, 2024

SURE-8882

Issue description:

The customer upgraded from Rancher 2.8.2 to Rancher 2.8.5 and some of their upstream fleet jobs are getting this error:

time=2024-07-30 15:16:18.000000 level=fatal msg="error downloading 'ssh://[email protected]/xxxxx/fleet-platform.git?sshkey=redacted': /usr/bin/git exited with 128: Cloning into '/tmp/getter624252719/temp'...\nNo user exists for uid 1000\r\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.\n"

Troubleshooting steps:

The customer  tried changing the credentials and still get the same error.
They are able to clone the repository locally using the same credentials supplied to Fleet. This also happens on most of the configured repositories, not just one or two git repos.
They are able to exec into the gitjob pod and manually clone the repo with success.
Checked from inside the GitJob pod:

 > kubectl exec -n cattle-fleet-system gitjob-7889c69f49-5kq8r -it -- cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
sshd:x:499:486:SSH daemon:/var/lib/sshd:/usr/sbin/nologin
gitjob:x:1000:1000::/home/gitjob:/bin/bash

I reviewed two of their GitRepo manifests (working/not-working) and they are literally pointing to the same repo, the only difference is the path used.
The customer dowgraded from 0.9.5 to 0.9.0 and the problem repos started to sync again

Repro steps:

unable to repro in-house

Workaround:

Is a workaround available and implemented? yes/no
What is the workaround: Downgrade fleet

Actual behavior:

After upgrade, some gitrepos fail with error:

time=2024-07-30 15:16:18.000000 level=fatal msg="error downloading 'ssh://[email protected]/xxxxx/fleet-platform.git?sshkey=redacted': /usr/bin/git exited with 128: Cloning into '/tmp/getter624252719/temp'...\nNo user exists for uid 1000\r\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.\n"

Expected behavior:
All gitrepos continue to sync with no error

Files, logs, traces:
 

Additional notes:

@kkaempf kkaempf added this to Fleet Aug 16, 2024
@github-project-automation github-project-automation bot moved this to 🆕 New in Fleet Aug 16, 2024
@kkaempf kkaempf modified the milestones: v2.9-Next1, v2.8-Next1 Aug 16, 2024
@kkaempf kkaempf added the JIRA Must shout label Aug 21, 2024
@kkaempf kkaempf modified the milestones: v2.8.8, v2.10.0 Oct 2, 2024
@manno manno modified the milestones: v2.10.0, v2.11.0 Oct 23, 2024
@cienijr
Copy link

cienijr commented Nov 6, 2024

We're facing the same issue here. SSH doesn't work at all when pulling helm charts.
It might be related to 326ad93, a quick Google search suggests that OpenSSH depends on the entry existing at /etc/passwd for some obscure reason.

Amusingly, using SSH for the GitRepo itself does work (the initcontainer runs fleet gitcloner and exits with success). It only seems to be an issue when running fleet apply.

We're using Rancher 2.9.3 with fleet 0.10.4

@manno manno moved this from To Triage to 📋 Backlog in Fleet Nov 13, 2024
@manno
Copy link
Member

manno commented Nov 15, 2024

We're facing the same issue here. SSH doesn't work at all when pulling helm charts. It might be related to 326ad93, a quick Google search suggests that OpenSSH depends on the entry existing at /etc/passwd for some obscure reason.

Amusingly, using SSH for the GitRepo itself does work (the initcontainer runs fleet gitcloner and exits with success). It only seems to be an issue when running fleet apply.

If downloading a helm chart via ssh fails, that would be a separate issue. The fleet apply CLI uses go-getter to download charts.
Cloning the git repository is done by fleet gitcloner and it uses go-git.

@cienijr
Copy link

cienijr commented Nov 15, 2024

I looked up the source code for go-getter, and its support for fetching from Git repos indeed relies on running git commands directly, which would trigger the OpenSSH error related to the missing /etc/passwd entry.

From what I could gather, go-git has a Git implementation of its own and it uses crypto/ssh for transport instead of OpenSSH - I'm pretty sure that it does not perform this validation, that seems to be the culprit for this weird requirement.

That's probably why this problem happens when fleet downloads a chart through Git+SSH but not when it fetches a GitRepo using Git+SSH - in our case, even when using the same repo and same credentials.

(I apologize if any of my messages comes out as confusing or maybe rude, as English is not my native language)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 📋 Backlog
Development

No branches or pull requests

3 participants