[SURE-8882] extend our testing to ssh downloads with keys #2751

kkaempf · 2024-08-16T15:34:22Z

SURE-8882

Issue description:

The customer upgraded from Rancher 2.8.2 to Rancher 2.8.5 and some of their upstream fleet jobs are getting this error:

time=2024-07-30 15:16:18.000000 level=fatal msg="error downloading 'ssh://[email protected]/xxxxx/fleet-platform.git?sshkey=redacted': /usr/bin/git exited with 128: Cloning into '/tmp/getter624252719/temp'...\nNo user exists for uid 1000\r\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.\n"

Troubleshooting steps:

The customer tried changing the credentials and still get the same error.
They are able to clone the repository locally using the same credentials supplied to Fleet. This also happens on most of the configured repositories, not just one or two git repos.
They are able to exec into the gitjob pod and manually clone the repo with success.
Checked from inside the GitJob pod:

 > kubectl exec -n cattle-fleet-system gitjob-7889c69f49-5kq8r -it -- cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
sshd:x:499:486:SSH daemon:/var/lib/sshd:/usr/sbin/nologin
gitjob:x:1000:1000::/home/gitjob:/bin/bash

I reviewed two of their GitRepo manifests (working/not-working) and they are literally pointing to the same repo, the only difference is the path used.
The customer dowgraded from 0.9.5 to 0.9.0 and the problem repos started to sync again

Repro steps:

unable to repro in-house

Workaround:

Is a workaround available and implemented? yes/no
What is the workaround: Downgrade fleet

Actual behavior:

After upgrade, some gitrepos fail with error:

time=2024-07-30 15:16:18.000000 level=fatal msg="error downloading 'ssh://[email protected]/xxxxx/fleet-platform.git?sshkey=redacted': /usr/bin/git exited with 128: Cloning into '/tmp/getter624252719/temp'...\nNo user exists for uid 1000\r\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.\n"

Expected behavior:
All gitrepos continue to sync with no error

Files, logs, traces:

Additional notes:

The text was updated successfully, but these errors were encountered:

cienijr · 2024-11-06T03:01:19Z

We're facing the same issue here. SSH doesn't work at all when pulling helm charts.
It might be related to 326ad93, a quick Google search suggests that OpenSSH depends on the entry existing at /etc/passwd for some obscure reason.

Amusingly, using SSH for the GitRepo itself does work (the initcontainer runs fleet gitcloner and exits with success). It only seems to be an issue when running fleet apply.

We're using Rancher 2.9.3 with fleet 0.10.4

manno · 2024-11-15T10:30:32Z

We're facing the same issue here. SSH doesn't work at all when pulling helm charts. It might be related to 326ad93, a quick Google search suggests that OpenSSH depends on the entry existing at /etc/passwd for some obscure reason.

Amusingly, using SSH for the GitRepo itself does work (the initcontainer runs fleet gitcloner and exits with success). It only seems to be an issue when running fleet apply.

If downloading a helm chart via ssh fails, that would be a separate issue. The fleet apply CLI uses go-getter to download charts.
Cloning the git repository is done by fleet gitcloner and it uses go-git.

cienijr · 2024-11-15T17:27:01Z

I looked up the source code for go-getter, and its support for fetching from Git repos indeed relies on running git commands directly, which would trigger the OpenSSH error related to the missing /etc/passwd entry.

From what I could gather, go-git has a Git implementation of its own and it uses crypto/ssh for transport instead of OpenSSH - I'm pretty sure that it does not perform this validation, that seems to be the culprit for this weird requirement.

That's probably why this problem happens when fleet downloads a chart through Git+SSH but not when it fetches a GitRepo using Git+SSH - in our case, even when using the same repo and same credentials.

(I apologize if any of my messages comes out as confusing or maybe rude, as English is not my native language)

kkaempf added kind/bug area/gitjob labels Aug 16, 2024

kkaempf added this to Fleet Aug 16, 2024

github-project-automation bot moved this to 🆕 New in Fleet Aug 16, 2024

kkaempf modified the milestones: v2.9-Next1, v2.8-Next1 Aug 16, 2024

kkaempf added the JIRA Must shout label Aug 21, 2024

kkaempf modified the milestones: v2.8.8, v2.10.0 Oct 2, 2024

manno modified the milestones: v2.10.0, v2.11.0 Oct 23, 2024

manno moved this from To Triage to 📋 Backlog in Fleet Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SURE-8882] extend our testing to ssh downloads with keys #2751

[SURE-8882] extend our testing to ssh downloads with keys #2751

kkaempf commented Aug 16, 2024

cienijr commented Nov 6, 2024 •

edited

Loading

manno commented Nov 15, 2024

cienijr commented Nov 15, 2024

[SURE-8882] extend our testing to ssh downloads with keys #2751

[SURE-8882] extend our testing to ssh downloads with keys #2751

Comments

kkaempf commented Aug 16, 2024

SURE-8882

Issue description:

Troubleshooting steps:

Repro steps:

Workaround:

Actual behavior:

cienijr commented Nov 6, 2024 • edited Loading

manno commented Nov 15, 2024

cienijr commented Nov 15, 2024

cienijr commented Nov 6, 2024 •

edited

Loading