-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to add custom compression / decompression? #1784
Comments
Hey @ziadomalik would love to accept the contribution to support your compression method. Most of our compression code is written here https://github.com/activeloopai/Hub/blob/main/hub/core/compression.py. Feel free to join our slack communtiy at https://slack.activeloop.ai #develop channel to discuss in more details how to complete the contribution. Looking forward to it. |
Hey @ziadomalik. The list of supported compressions can be found in hub/compression.py. Make sure to add your new format there. The decompression code can be found at hub/core/compression.py. Import the required libraries and write your decompression function (something like |
@ziadomalik I would like to work.Kindly assign me |
hey @h20200051 , We can only assign one issue per contributor, which one would you like to take on? |
Hey @mikayelh @davidbuniat is this issue still open? I would like to try and contribute to this issue. |
Hi all, so I've been assigned to work on other projects, so in this case this issue is on hold (for now). Yet it's still something we are actively discussing, and if it's appropriate, we could close this issue and reopen it once it becomes relevant again. I still need to the green light from my Project Manager. I hope you understand and thank you for your patience. |
@ziadomalik if you'd like, we can assign this issue to @Hussain0520 to work on it in the meantime, but if you want to be the one who writes this particular code, I'm not against putting this on hold. Maybe the best solution could be allowing someone else to take a stab and then improving on their contribution later on? |
@mikayelh Sounds like a plan! As a starting point, you guys could check out our Python documentation here and learn about the technology itself here. We also have a C++ API. Whenever you have questions, code review requests or anything I could help with, feel free to ping me! |
That's awesome! hey @Hussain0520! I've just assigned you this issue - feel free to check in with us and @ziadomalik in case you need any help! Thanks for following up, @ziadomalik :) |
Hi, so I spoke with my project manager. Normally, we wanted to postpone this all the way to January because internally, we are still experimenting with the cloud and how our compression fits best into that context. If you guys would like to discuss, we could hop on a call so we could figure out the best way we can integrate Jetraw into the Activeloop Hub. |
Thank you @mikayelh @ziadomalik . I'll surely contact you guys for help. |
Please allow me to sneak in here... I was looking for a way how to compress my 1 million nifti stacks. On one hand "nifti" is not yet supported by deeplake (but dicom is). On the other hand, I was looking for a way to use the general dtype and add a custom compression on top. I was even more surprised to see someone from dotphoton here (@ziadomalik) You guys are on my list for more than a year. The stars seem to align :-) |
Hi @St3V0Bay. Thx for following up on this thread! Adding custom compression is quite tricky, because even if it's implemented in Deep Lake OSS, it won't work in our visualizer or the optimized C++ dataloader, because they are not in the OSS repo. We're also happy to add support for your nifti data directly. Are you working with dicom files that are combined into nifti stacks? If you're able to provide us with example data, we can implement support for it across our stacks. Regarding dotphoton, are you using any of their compressions currently, or this is something you're excited about for future work? |
Hi @istranic, Re nifti: In the medical imaging domain most open-source data is offered as nifti (bioimaging has their own preference however). That's the best format for data scientists to get started. However, DICOM is the true standard that is actually used in the clinic (you have that already integrated, which is great). To pool DICOM data with open-sourced nifti files, the dicom files are converted (e.g. https://github.com/rordenlab/dcm2niix). The other way (from nifti > dicom) is a lot more complicated. Exemplary nifti files can be pulled using this repo (https://github.com/neheller/kits19). After installation it is just a one liner. You can look at the data using ITKSnap (for example; http://www.itksnap.org/pmwiki/pmwiki.php) and it can be opened in Python with the PyLib called nibabel (https://nipy.org/nibabel/). Another huge nifti repository is here: http://medicaldecathlon.com/ Re dotphoton: we are not using it. But their value proposition is really charming, which is: less costs for storage, faster data transfer. In projects with a certain size this really starts to matter, because things add up quickly if you have literally millions of data points. |
Thanks for the info @St3V0Bay. We'll keep you in the loop regarding our decision making around nifti support. |
Hi all! I have all my data stored in an S3 bucket and I would like to use Hub to load my data from S3, yet my data on the cloud is compressed using Jetraw by Dotphoton and I would like it to decompress the images as I am pulling them from the cloud. I would be ready to write code to make this happen, but as I am new to the code base I would like to know where this would fit in the most and where I should start. Thank you all in advance!
The text was updated successfully, but these errors were encountered: