Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

squash refs/convert/... branch history after each series of commits #3131

Merged
merged 7 commits into from
Feb 3, 2025

Conversation

severo
Copy link
Collaborator

@severo severo commented Jan 22, 2025

fixes #1774

@severo
Copy link
Collaborator Author

severo commented Jan 27, 2025

Sorry to bother @Wauplin: do you know why we cannot squash the refs/convert/{parquet,duckdb} branches yet on hub-ci? (see errors in the CI)

@Wauplin
Copy link

Wauplin commented Jan 27, 2025

do you know why we cannot squash the refs/convert/{parquet,duckdb} branches yet on hub-ci?

cc @Pierrci @coyotte508 for that (dunno myself)

What's sure is that squashing normal branches works correctly on hub-ci

@coyotte508
Copy link
Member

@Kakulukian
Copy link
Member

API works fine on hub-ci maybe refs/convert/parquet needs to be urlencoded maybe not supported with the current method of hfh

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm ! feel free to merge once it's all good for you

@severo
Copy link
Collaborator Author

severo commented Jan 28, 2025

I'll try to pass the tests, otherwise we will have the same issue in prod without a doubt.

@coyotte508
Copy link
Member

API works fine on hub-ci maybe refs/convert/parquet needs to be urlencoded maybe not supported with the current method of hfh

cc @hanouticelina maybe

@severo
Copy link
Collaborator Author

severo commented Jan 28, 2025

The error is:

E           huggingface_hub.errors.HfHubHTTPError: 404 Client Error: Not Found for url: https://hub-ci.huggingface.co/api/datasets/DVUser/csv-17379738584545/super-squash/refs/convert/parquet (Request ID: Root=1-67976585-22683a2a39a5eeb41f7b288a;94d07c7c-ce88-40f3-994b-239988927311)

https://github.com/huggingface/dataset-viewer/actions/runs/12911690116/job/36214768824?pr=3131#step:8:9086

@Wauplin
Copy link

Wauplin commented Jan 28, 2025

(...)csv-17379738584545/super-squash/refs/convert/parquet 

=> indeed a url-encoding issue yes. Will fix that

@severo
Copy link
Collaborator Author

severo commented Jan 28, 2025

hmmm it was not an issue :)

@severo severo reopened this Jan 28, 2025
@severo
Copy link
Collaborator Author

severo commented Jan 28, 2025

let's wait for the next hfh release, upgrade here, and we should be good

@Wauplin
Copy link

Wauplin commented Jan 28, 2025

Yep, sorry about the closing 😬

huggingface_hub 0.28.0 is out! https://github.com/huggingface/huggingface_hub/releases/tag/v0.28.0

@severo
Copy link
Collaborator Author

severo commented Jan 28, 2025

yes, thanks! I'm upgrading the deps

@severo
Copy link
Collaborator Author

severo commented Jan 28, 2025

Something strange, maybe related to #3092, @lhoestq while upgrading in webhook service. Could you have a special look at this part?

➜  webhook git:(squash_references) ✗ poetry update huggingface-hub libcommon libapi
Updating dependencies
Resolving dependencies... (44.1s)

Package operations: 1 install, 1 update, 0 removals

  - Installing pyarrow-hotfix (0.6)
  - Downgrading datasets (3.0.3.dev0 1946182 -> 2.19.1 1946182)

Writing lock file

Plus: the operation in the git diff is the opposite:

-version = "2.19.1"
+version = "3.0.3.dev0"

@severo
Copy link
Collaborator Author

severo commented Jan 28, 2025

OK, a lot of type errors, related to huggingface/huggingface_hub#2752 I guess. Not very important, but let's try to fix them here too

@severo
Copy link
Collaborator Author

severo commented Jan 28, 2025

Still failing. I made it work when using the repository user token, but it fails when using the bot user token. So: it's an issue in moon-landing: the bot should be able to squash refs/convert/... branches. I opened an issue: https://github.com/huggingface-internal/moon-landing/issues/12355

cc @coyotte508 @Kakulukian @Pierrci FYI (I could open a PR, but I don't feel super confident for auth-related stuff)

@coyotte508
Copy link
Member

PR open

@severo
Copy link
Collaborator Author

severo commented Jan 31, 2025

OK, the squash works. But we have another issue, possibly due to the hfh upgrade

@Wauplin
Copy link

Wauplin commented Jan 31, 2025

Happy to investigate if it's hfh-related but looks like a credential issue to me:

E               huggingface_hub.errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-679cfee5-239b138677e055ab65d1ee0e;f1978d0d-1f12-422d-9c57-f73597e6a3e5)
E               
E               Repository Not Found for url: https://huggingface.co/api/datasets/severo/winogavil.
E               Please make sure you specified the correct `repo_id` and `repo_type`.
E               If you are trying to access a private or gated repo, make sure you are authenticated.
E               Invalid credentials in Authorization header

@coyotte508
Copy link
Member

coyotte508 commented Jan 31, 2025

note that it needs to be an app token associated to a user (eg like the app token that pushes the commits to the parquet ref currently)

@severo
Copy link
Collaborator Author

severo commented Feb 3, 2025

OK, the 401 error was already present in a previous commit: https://github.com/huggingface/dataset-viewer/actions/runs/12828524481/job/36584381335. So... I think we can merge this one and then fix the issue with 401.

Can it be linked with https://huggingface.slack.com/archives/C02EMARJ65P/p1738582122666929?thread_ts=1672852567.996239&cid=C02EMARJ65P (internal)?

cc @lhoestq wdyt

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure feel free to merge already, it should work directly once the issue is fixed

@severo severo merged commit c598858 into main Feb 3, 2025
26 of 27 checks passed
@severo severo deleted the squash_references branch February 3, 2025 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Squash the commits in the refs/convert/parquet and refs/convert/duckdb branches
6 participants