Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support pulling by a specific commit or a date #48

Open
OmarIthawi opened this issue Jan 10, 2024 · 6 comments
Open

feat: support pulling by a specific commit or a date #48

OmarIthawi opened this issue Jan 10, 2024 · 6 comments

Comments

@OmarIthawi
Copy link
Member

Currently the atlas pull supports a --branch argument which accepts only git branches and tags.

atlas pull could be more verstile if it supported a more generic --revision argument which could accept the following revisions:

  • Git branch e.g. git checkout open-release/quince.master
  • Git tag e.g. git checkout open-release/quince.1
  • Git commit SHA git checkout f098hab11
  • Git branch at a specific date git checkout 'main@{2020-11-06}' as supported by https://linux.die.net/man/1/git-rev-parse
@timmc-edx
Copy link

Currently, atlas performs the following sequence:

  1. A shallow git clone of a specific branch or tag (using --branch, which doesn't support commit hashes)
  2. git sparse-checkout configuration
  3. git checkout HEAD
  4. Delete the .git dir

I think it would be as simple as changing the git clone to do a full clone, then change the checkout to git checkout "$pull_revision". The only difference in performance would be pulling down ~75 MB rather than 8 MB (based on local experimentation just now) -- the resulting size on disk would stay the same, as the .git dir would still be cleared.

I might have found a way to keep the reduced download size, but it would require some experimentation, and I'm not sure yet which versions of git support it.

timmc-edx added a commit to timmc-edx/openedx-atlas that referenced this issue Dec 20, 2024
By switching from clone to an explicit init+fetch we can get better control
of what commits are fetched. clone only allows targeting a branch or a
tag, but fetch can ask for any commit. The caveat is that the server needs
to allow this via `uploadpack.allowReachableSHA1InWant` and similar.

Minor changes:

- checkout is now also quiet by default

See openedx#48
timmc-edx added a commit to timmc-edx/openedx-atlas that referenced this issue Dec 21, 2024
By switching from clone to an explicit init+fetch we can get better control
of what commits are fetched. clone only allows targeting a branch or a
tag, but fetch can ask for any commit. The caveat is that the server needs
to allow this via `uploadpack.allowReachableSHA1InWant` and similar.

Minor changes:

- checkout is now also quiet by default

See openedx#48
@timmc-edx
Copy link

I ended up opening a PR for the second approach: #61

@OmarIthawi
Copy link
Member Author

Hey @timmc-edx!!

Thanks a ton for your contribution!! I'm glad you chose this feature to implement.
I'll review and check about git-compatiblity.

Additionaly, I need to review in details the two options you've mentioned to ensure I fully understand the performance impact.

Download Size issues

If we go with the worst option: full git clone, the performance impact might not be as minimal as it looks like.

The 24MB is a three times as large as 8MB at the moment, but difference (repo size vs commit size) grows exponentially. So in the future it's expected that the difference is more of 20 or even 100x (1000MB vs 10MB).

The translation pull, happens multiple times during a server rebuild, so 24MB pull in 8 images would be more like 192MB instead of 64MB.

Atlas is used by both developers on their local machine and CI servers with sometimes choppy or slow internet connection.

Sometimes the developer only needs to pull iOS app which is a few kilobytes of data and in such cases a tx pull would be much more efficient.

@timmc-edx
Copy link

Yeah, I wasn't envisioning the "pull everything and then delete" as a long-term solution since it looked like the only other option might require a newer minimum version of git. But I think the other solution I came up with is better anyhow.

I'm not sure where I got 8 MB from in that previous comment; edx-platform currently gets 14 MB of translations. I put some logging lines into the version of atlas that currently gets installed into atlas -- each line calls du -hs . from inside the translations_TEMP dir:

After clone: 352K .
After sparse checkout rules set: 360K .
After checkout: 16M .
After deletion: 14M .

With the version I submitted as a PR, also pulling main:

After init: 124K .
After remote-add: 124K .
After fetch: 336K .
After set sparse checkout rules: 344K .
After checkout: 16M .
After deletion: 14M .

As to git compatibility... I got it to pass integration tests with 2.25.1 in the GitHub CI, but when I run that version of git locally with the latest atlas, it downloads 115 MB of stuff. Something about sparse checkouts or shallow cloning not working?

fatal: Unable to create '/home/timmc/edx-repos/edx-platform/translations_TEMP/.git/info/sparse-checkout.lock': No such file or directory

...but maybe it's OK that the oldest supported version of git does this, as long as it otherwise works. (Interestingly, with the version in my PR it only downloads 65 MB. I haven't checked why.)

@OmarIthawi
Copy link
Member Author

We can require a newer version of git if that's helpful. Alternatively, we can use a server 😅 and atlas would be a client. Not a bad option, but we need to reduce the github.com/openedx repos and not increase them.

Anyway, I will review the solution in details and get back to you.

@timmc-edx
Copy link

Just to be clear, the 2.25.1 issue is not that it doesn't work, but that it pulls more than necessary (both with the current atlas and my proposed change). At most, I would recommend adding a warning that older versions of git may be less efficient. It should be fine to keep supporting it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants