-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notes about DataLad usage details #1
Comments
Hi, thanks for the questions/comments. For now, the 'video' to which I refer in the main README is a place holder, and ultimately that pointer can point wherever you prefer. Regarding the videos, each of us create one, and we upload it to the OHBM site. But, since it's each of 'our' creations, I think (but I guess should double check the policy), we can also use those videos as we want, i.e. to our favorite YouTube locations. So, there may indeed be (at least) two places that these videos get put, and where this README points, like I mentioned, is up to us. I agree that this video pointer being to a more publicly accessible one, such as the YouTube version, would be preferable in some ways, for the long term utility of this repo. Yes, your assumption that each of these videos need to also go to the OHBM platform. Have you received instructions for OHBM and/or Fourwaves? Actually, I haven't, but there may not have been a 'slot' for the organizers, just for the 'presenters', I'm trying to resolve that at the moment. All presenters, let me know if you have OHBM upload instructions. Re installing data: the evolving workflow in the Exercise README was guided by @yarikoptic and featured the 'datalad install' commands in a YODA-style design. The nuance of 'clone' versus 'install', is over my head and I'll leave the decisions about what and how to best do this to your collective advice. Y'all know what we're trying to demonstrate, feel free to update my hacky way of doing this to something better. I was using publish more conceptually rather than specific command-ly. I would value guidance in how to approach the details of this, I've just not yet personally gotten to that step to muddle through it yet... I was indeed expecting something 'create-sibling'ish, and had not decided a service (Gin appears in some other exercises I've been through). I just don't want the students to have to spend too much time struggling through authentication details, which, while important, would be a distraction to the mission... |
Thanks for the info @dnkennedy !
Sound good, thanks 👍
OK. I think GIN is perhaps the lesser evil in this case because it's free and open, even if students will have to configure SSH keys. I'm guessing the output from the containerized workflow will be large in terms of file size? This would make publishing to a GitHub sibling difficult, which would probably be the easiest alternative in case workflow outputs are small enough and/or text-based. For the datalad-based RDM training workshops that we've been running lately (also using JupyterHub), we decided on GIN because of these (and probably other) reasons. Here's a detailed walk-through of that content (which we can repurpose for this educational session if needed): https://psychoinformatics-de.github.io/rdm-course/03-remote-collaboration/index.html#publishing-datasets-to-gin |
with hope of possibly reducing confusion in students of "why I fork/clone on/from github but upload to Gin", you could avoid Gin and/or add it at the end as "you can distribute storage to multiple location", by utilizing github's LFS, per http://handbook.datalad.org/en/latest/basics/101-139-gitlfs.html . A figure showing the flow here
with below arrows to local clone depicting flow from/to them above could help establish a mental picture in students. |
I like the idea of git LFS, thanks @yarikoptic! @dnkennedy do you know more or less what size of the combined output from the run operation will be? AFAICT, LFS has a storage limit of 1GB for their free version. Hopefully the outputs are less than that? |
OK, I took a shot at the 'git lfs' version. The 'token' that is necessary is the same one needs in order for 'push' to work anyway, so we do have to work through that anyway. |
I'm going to close this issue. We 'think' git lfs will work, but there is still some whining (by me) going on over at #10 |
@dnkennedy thanks for the summary of the course given in the readme. I have a couple of notes and questions about the various steps where DataLad will be applicable:
clone
, i.e. "cloning data". Where previously we useddatalad install -r
to install all subdatasets recursively, we now suggest first cloning the dataset, and then usingget
with the[-n][--no-data]
flag:publish
is deprecated, can I suggest that we usepush
together with thecreate-sibling[-*]
functionality? Have you already decided which service we'll use to push the output data/results to?The text was updated successfully, but these errors were encountered: