-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial Draft of Automated Data Munging #1
Conversation
kyleabeauchamp
commented
Sep 12, 2014
- Convert Bzipped XTC files to all-atom HDF5 files with extra meta containing already processed filenames
- Strip water from all-atom HDF5 files to create protein HDF5 files
- Do this periodically on all local FAH datasets
What if we called this fah-tools? Or do you really want a separate repo |
Tools is pretty general; right now, the code is just for munging. We can change the name if the scope of the code expands in the future. |
This looks pretty good! The only thing I'd ask for is more documentation in the code about what the various "munging" steps do. |
What about periodic image issues? |
I suppose we'll have to add that later, as AFAIK we don't have Right now, the key issue is automating the |
@schwancr I just adjusted the stripping function keep the unitcell information in the protein HDF5, which should allow us to perform downstream PBC changes. |
Yea that sounds like a good idea. Ideally |
Has anyone looked at the PBC-whole code in gromacs or ambertools? It might -Robert On Mon, Sep 15, 2014 at 12:15 PM, Christian Schwantes <
|
But it doesn't work that well. They're (gromacs) recipe for doing it involves several calls of the same command-line script and even then they admit it doesn't work in all cases. |
@kyleabeauchamp: what's the appropriate forum to discuss the provenance metadata storage (e.g. I'm not sure that storing extra attributes on the HDF5 files is the best way to go -- if we really want to do that, we should consider simply adding that field to the MDTraj HDF5 format spec. We could also do something more akin to the MSMBuilder 2 design, where a separate metadata file is stored which contains the provenance info. It might be nice, also, not to irreversibly tie this data munging step to the use of HDF5 files for the output. It would be helpful to get to some consensus on these design choices, especially as we start pushing mixtape for end users. |
This is working well enough for now, we will discuss future iterations in issue #2 |
Initial Draft of Automated Data Munging
Merge pull request #12 from steven-albanese/master