Skip to content

Reading Files from Git LFS Repo #1438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
johko opened this issue Nov 24, 2023 · 4 comments
Open

Reading Files from Git LFS Repo #1438

johko opened this issue Nov 24, 2023 · 4 comments

Comments

@johko
Copy link

johko commented Nov 24, 2023

Hey,

I'm trying to read files from a GitHub LFS Repo (https://github.com/openai/dalle3-eval-samples/tree/main) but only get the pointers to the actual large files (the images in the repo), instead of the binaries.

Is there any way of reading these files from an LFS repo with fsspec?

My current testing code is:

import fsspec

github_repo = fsspec.get_mapper("github://openai:dalle3-eval-samples@main")
for file_name in github_repo:
    file = github_repo[file_name]
@martindurant
Copy link
Member

We don't have such an integration. I don't know how LFS works in detail, but I imagine it's not too complex, if you would like to implement it. I know that some git-based data services which integrate already with fsspec (dvc, lakefs, xet, maybe others).

@johko
Copy link
Author

johko commented Nov 24, 2023

Thanks for the really quick response @martindurant .

I have to admit I don't know too much of the inner workings of LFS myself. But sounds like a fun project to investigate, so if I find the time I'll implement it 🙂

@martindurant
Copy link
Member

It doesn't look terrible:

>>>  print(github_repo["t2i_compbench/sdxl/complex_val/The%20black%20camera%20was%20next%20to%20the%20white%20tripod._000160.png"].decode())
version https://git-lfs.github.com/spec/v1
oid sha256:ac618aaf4f05d1a938323f4d37d78877d03b5afb2d4f04af183f298d60e33b55
size 1133715

@thomasgilgenast
Copy link
Contributor

I have recently encountered a situation where this feature would be useful to me.

I have opened #1810 as one possible proposal for how to add Git LFS support to the github implementation and would welcome feedback on it @johko @martindurant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants