-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Python streaming #29
base: master
Are you sure you want to change the base?
Conversation
…dler in hs_core.signals.py. This update enables us to track data downloads
…go_irods into python-streaming
@alvacouch Let me know what you think. I think we're on the same page with respect to simplifying what's in django_irods. My general principles are 1) that this should be usable by other projects and should not have any HS specific code in it 2) python-irodsclient is likely a better basis for a package than icommands (or fuse) 3) this should act like a regular django storage and not include (at least not too much) extra stuff. I realize that these tenants might be overly optimistic at this stage in the game given the impacts that it may have on the overall HS code base, but I'm interested to hear your thoughts on the matter. I also, realize that I likely have removed a bit too much and am interested in potentially reimplementing any significant functionality that doesn't go against the principles and that is needed for HS. |
TODOs:
|
@alexlemann I am not 100% sure about the purpose of this work and where this is aimed at, but want to make sure the intent of this work is not to rely on python irods client solely for file transfer between hydroshare django server and iRODS. The reason is that python irods client does not support parallel file transfer, so you can use python irods client for file listing, adding metadata, etc., but not for file transfer especially for transfer of big files. The current implementation is to use icommands underneath for file transfer which can leverage iRODS parallel file transfers for performance reasons. |
@hyi Here's the reference to STDOUT and threading in irods I was talking about: You can check on Linux by checking |
Thanks for the feeback, here @hyi and those that entered the conversation on the call. As we discussed on the call, the question I had was whether the threading or not threading would be the bottleneck or whether the end-user's internet connection is likely going to be the bottle neck. In the case of the streaming here, a temporary copy is not required to be completely made before starting to write out to the end HS client which could possibly lead to a performance win to the end user. I also offered that the only place that the number of threads is limited in @alvacouch had other concerns about multi-threading and overall server resource usage that he will need to elaborate and distinguish how they are relevant to the changes proposed here either in the code or conversation. |
@alexlemann Thanks for the pointer on iget piping to stdout not supporting multiple threads. I tested it out and confirmed it is indeed the case. When I transfer a big file using iget, it used 17 threads, but when I used |
No description provided.