-
Notifications
You must be signed in to change notification settings - Fork 23
WIP: Added json conversion of NWChem to workflow #557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@cjh1 So this is basically working. I have added to the NWChem taskflow and it nows runs the json conversion script, generates the json file and uploads it to Girder along with the rest of the files. There are a couple of issues I think need to be worked out but they could wait until after we have added the ability to do visualization with WebGL. Here is a summary of them.
Do you want me to address any of these issues or should we move on to getting the visualization working? I am thinking we may want to merge this work once we are happy with it since I am guessing the visualization will be more involved. |
Can we keep TODO's out of code and file them as issues or items on issues even if temporary |
Current coverage is 61.80% (diff: 100%)@@ master #557 diff @@
==========================================
Files 59 59
Lines 2788 2788
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
Hits 1723 1723
Misses 1065 1065
Partials 0 0
|
On Wed, Dec 7, 2016 at 3:51 PM, Chet Nieter ***@***.***> wrote:
@cjh1 <https://github.com/cjh1> So this is basically working. I have
added to the NWChem taskflow and it nows runs the json conversion script,
generates the json file and uploads it to Girder along with the rest of the
files. There are a couple of issues I think need to be worked out but they
could wait until after we have added the ability to do visualization with
WebGL. Here is a summary of them.
- Since the relevant data is in the standard output of NWChem I made
some assumptions about the filename that holds the output based on the
results on ulmus. This is dependent on the job scheduler so right now it
may only work for SGE.
We should resolve this, can you just add a redirect in the submission
script. So something like nwchem input &> nwchem_output?
- I feel like we might want to rename the json file after it is
generated since the conversion script just appends '.json' the output
filename.
Sounds like a good idea
- Right now the location of the conversion script is hard coded for
its location on ulmus. This should probably be part of the cluster
configuration. We could consider doing the conversion on the server by
including the conversion module but that would require copying the output
file from the cluster to the server.
Is the conversion script just a single file? Could we just download it as
part of the taskflow, if it doesn't already exist? Do we know what the license is? May be we could check it into our repo ....
- I added the json conversion to the upload_output task. Do we want it
to live in its own task?
I think that probably make sense.
Do you want me to address any of these issues or should we move on to
getting the visualization working? I am thinking we may want to merge this
work once we are happy with it since I am guessing the visualization will
be more involved.
I would say let resolve these issue before moving forward.
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#557 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA-7jVj03Ku-L_xiSBcT-Y30IwsRe87Nks5rFxxQgaJpZM4LHBRC>
.
|
TODO's removed, at least from the file I touched. List of issues now lives at #558 |
It is two files. One holds a python module that does all the work and the other is just a simple python script the calls the module. |
@chetnieter One thing to keep in mind here is that we are going to have to run this JSON through another tool (Avogadro ) ( I will get the details ) which is a C++ application. We might want to look at how we could host some of these environments in a docker container which is what girder_worker seems todo |
We can reuse parts of this, calculate_mo(...) is the function we want. @cryos Does avogadrolibs master support python wrapping? |
Added method to the NWChem workflow that calls the json conversion script. Currently calling it from the update_output method.
1ee58ac
to
cf24e7c
Compare
Added redirect to nwchem submission script. This means the file holding the standard output from nwchem will have a name that is independent of the job scheduler used making easier to run the json converter on it.
Moved the json conversion from the upload_output task to its own task.
@cjh1 yes, the way it is configured is documented in the ansible code we developed in the Phase I DOE project, mongochemdeploy. I want to move it to use PyBind11 soon, but you should be able to use this for now. It even has the workaround I used to find the right Python 3 on an Ubuntu host. |
I am new Docker so I have some questions:
I can start working on a Dockerfile that creates an image with the json conversion script. |
Yes, when we say server we mean the machine or machines hosting the cumulus stack
Are we not already pull it off the cluster to upload it into GIrder. When is the conversion to JSON currently taking place?
Yes, when we have something working
I would think just downloading in the container?
Sure, if you would like to experiment with this. Otherwise, we can punt on using a container and continue to install things directly on the server to get the end to end use case work i.e. build avogdralibs on the server. |
Initial Dockerfile for container to hold the nwchem json converter and eventually avogadro. I was able to run the json conversion by running docker run with -v to mount the directory with the nwchem output file.
Switched to running the json conversion script for nwchem in a Docker container. The nwchem tasks now copies the output file to the server, runs the docker container to generate the json, and then copies the json output back to the cluster. Several things need to be cleaned up like the hard-coded settings for the docker container.
Now passing in the nwchem output filename to the docker run command rather than having it hard-coded in the docker image.
Some minor clean up including fixing a comment and removing a python module that is not being used.
|
||
# Run docker container to post-process results - need to add docker image to upstream_result | ||
command = ['docker', 'run', '--rm', '-v', '%s:/hpccloud' % tmp_dir, | ||
'chetnieter/nwchem-postprocess', out_file] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we have a kitware account?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a kitware user on Dockerhub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chetnieter I have created a hpccloud user account, we can use that for our images.
if p.returncode != 0: | ||
print('Error running Docker container.') | ||
print('STDOUT: ' + stdout) | ||
print('STDERR: ' + stderr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be logged to the task logger.
task.logger.error(stdout)
task.logger.error(stderr)
local_path = os.path.join(tmp_dir, out_file + '.json') | ||
with get_connection(task.taskflow.girder_token, cluster) as conn: | ||
with open(local_path, 'r') as local_fp: | ||
conn.put(local_fp, cluster_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need the JSON back on the cluster, it should just be uploaded to Girder
|
||
RUN git clone https://github.com/wadejong/NWChemOutputToJson.git /opt/NWChemOutputToJson | ||
|
||
ENTRYPOINT ["python", "/opt/NWChemOutputToJson/NWChemJsonConversion.py"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want this in a workflow specific path, we already have one for the taskflows, so may be:
server/docker/nwchem/Dockerfile ?
|
||
try: | ||
# Copy the nwchem output to server | ||
tmp_dir = tempfile.mkdtemp() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be 'safer' to use a context manager here to ensure things get cleaned up.
with tempfile.TemporaryDirectory(...) as tmp_dir:
....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chetnieter Sorry I didn't realize TemporaryDirectory was introduced in Python 3.
Sone changes from code review. This includes moving the docker file to a more appropriate location and in a sub-folder that reflects the associated taskflow. Also passing stderr and stdout from any failed calls to docker in nwchem task flow. Using context manager for temporary directory which adds dependency on backports module. Uploading json output directly to girder rather than copying it back to the cluster still needs to be done.
print('Error running Docker container.') | ||
print('STDOUT: ' + stdout) | ||
print('STDERR: ' + stderr) | ||
task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look like this is a typo?
local_path = os.path.join(tmp_dir, out_file + '.json') | ||
with get_connection(task.taskflow.girder_token, cluster) as conn: | ||
with open(local_path, 'r') as local_fp: | ||
conn.put(local_fp, cluster_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chetnieter We should factor out the docker run code into a set of function that can be reused by other taskflows.
Changes needed to add the conversion of the ascii output of NWChem to json format so it can be parsed to visualization with WebGL.