Support IPFS, DAT, or other distributed storage #6777
Replies: 13 comments 6 replies
-
Hi @remram44 ! Great idea, thank you! If you wish to implement support for some of those, please feel free to take a look at A current workaround for that would be to just pack your Thanks, |
Beta Was this translation helpful? Give feedback.
-
Indeed this looks easy to add. I don't have the cycles to attempt this now, but I might try in the future. |
Beta Was this translation helpful? Give feedback.
-
The idea of storing data in p2p\blockchain looks very appealing. We develop DVC based mostly on our industrial data science experience where p2p is not a big part of this industrial environment. But it might become soon! Recently, I got another request (not in GItHub) regarding denet.pro dApp for storing data for DVC. It would be great to understand this p2p datasets landscape:
If there is a demand we can definitely implement this. @remram44 please let me know if you use this kind of storages. I would really like to discuss what are use cases and your thoughts. Or if you can connect us to other users or the tool\protocol creators. |
Beta Was this translation helpful? Give feedback.
-
I'm thinking about the case where you make a analysis public, e.g. publish it on GitHub. Having everyone download from your S3 bucket would incur charges, hosting it on some box in your lab would provide very limited bandwidth. Peer-to-peer solutions would scale nicely. |
Beta Was this translation helpful? Give feedback.
-
Are there any plans to implement this? @remram44 @dmpetrov Also I can see further applications in the field of scientific reproducibility and general public data sharing. |
Beta Was this translation helpful? Give feedback.
-
@icks No such plans from the core team, at least for now. Would appreciate if you could share your thoughts on this and in which scenarios you would like to use it. Contributions are always welcomed, feel free to give it shot. Ping us here or on discord if you need any help 🙂 |
Beta Was this translation helpful? Give feedback.
-
@icks it would be a good addition indeed. Unfortunately, it would take a while for the core team to prioritize this like @efiop mentioned :( We would really love for the community to do a contribution in this case and we can provide all the support and help on this. |
Beta Was this translation helpful? Give feedback.
-
For the record: if anyone would be interested in contributing support for any of these, I would highly recommend starting with writing an https://github.com/intake/filesystem_spec/ -compatible filesystem class, as that's what dvc is migrating to. |
Beta Was this translation helpful? Give feedback.
-
This is unlikely to look like a normal fsspec backend, because with content addressing you cannot choose the name of the destination (it includes a hash of the content). |
Beta Was this translation helpful? Give feedback.
-
@remram44 That was one of the major problems with #4736 . I'm hoping we could find some way to handle that. With fsspec or without, the fact that after each |
Beta Was this translation helpful? Give feedback.
-
Moving to discussion since this is not actionable yet. |
Beta Was this translation helpful? Give feedback.
-
Using Discussions as feature requests is pretty unusual. Or am I to understand that whether this would be accepted is under discussion? People are unlikely to pick items that were removed from issues to work on them. |
Beta Was this translation helpful? Give feedback.
-
I have a use case where I've collected about 20GB of data (so far) I'd like to publish as a free-to-use open dataset. IPFS seems like a good way to accomplish that. Note: I am new to IPFS, so I might not fully appreciate the limitations and challenges. I see here that there is a POC fsspec for read-only IPFS: https://github.com/fsspec/ipfsspec Perhaps using something like DNSLink would be the "right way" to handle a read/write dataset? |
Beta Was this translation helpful? Give feedback.
-
Having an option to share data files in a peer-to-peer way is probably a good idea. It eliminates the need to pay for external services, and scales much better in the "public open project" situation (where lot of cloners would mean substantial S3 costs).
IPFS is probably the easiest to support here, together with DAT. Using BitTorrent directly seems complicated.
Beta Was this translation helpful? Give feedback.
All reactions