Skip to content

WIP: Added json conversion of NWChem to workflow #557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

chetnieter
Copy link
Member

Changes needed to add the conversion of the ascii output of NWChem to json format so it can be parsed to visualization with WebGL.

@chetnieter chetnieter added the WIP label Dec 7, 2016
@chetnieter
Copy link
Member Author

@cjh1 So this is basically working. I have added to the NWChem taskflow and it nows runs the json conversion script, generates the json file and uploads it to Girder along with the rest of the files. There are a couple of issues I think need to be worked out but they could wait until after we have added the ability to do visualization with WebGL. Here is a summary of them.

  • Since the relevant data is in the standard output of NWChem I made some assumptions about the filename that holds the output based on the results on ulmus. This is dependent on the job scheduler so right now it may only work for SGE.
  • I feel like we might want to rename the json file after it is generated since the conversion script just appends '.json' the output filename.
  • Right now the location of the conversion script is hard coded for its location on ulmus. This should probably be part of the cluster configuration. We could consider doing the conversion on the server by including the conversion module but that would require copying the output file from the cluster to the server.
  • I added the json conversion to the upload_output task. Do we want it to live in its own task?

Do you want me to address any of these issues or should we move on to getting the visualization working? I am thinking we may want to merge this work once we are happy with it since I am guessing the visualization will be more involved.

@TristanWright
Copy link
Contributor

Can we keep TODO's out of code and file them as issues or items on issues even if temporary

@codecov-io
Copy link

codecov-io commented Dec 7, 2016

Current coverage is 61.80% (diff: 100%)

Merging #557 into master will not change coverage

@@             master       #557   diff @@
==========================================
  Files            59         59          
  Lines          2788       2788          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits           1723       1723          
  Misses         1065       1065          
  Partials          0          0          

Powered by Codecov. Last update 99ea1e8...a7bd2df

@cjh1
Copy link

cjh1 commented Dec 7, 2016 via email

@chetnieter
Copy link
Member Author

Can we keep TODO's out of code and file them as issues or items on issues even if temporary

TODO's removed, at least from the file I touched. List of issues now lives at #558

@chetnieter
Copy link
Member Author

Is the conversion script just a single file? Could we just download it as
part of the taskflow, if it doesn't already exist? Do we know what the license is? May be
we could check it into our repo ....

It is two files. One holds a python module that does all the work and the other is just a simple python script the calls the module.

@cjh1
Copy link

cjh1 commented Dec 7, 2016

@chetnieter One thing to keep in mind here is that we are going to have to run this JSON through another tool (Avogadro ) ( I will get the details ) which is a C++ application. We might want to look at how we could host some of these environments in a docker container which is what girder_worker seems todo

@cjh1
Copy link

cjh1 commented Dec 7, 2016

We can reuse parts of this, calculate_mo(...) is the function we want. @cryos Does avogadrolibs master support python wrapping?

Added method to the NWChem workflow that calls the json conversion
script. Currently calling it from the update_output method.
@chetnieter chetnieter force-pushed the nwchem-json-generation branch from 1ee58ac to cf24e7c Compare December 8, 2016 14:13
Added redirect to nwchem submission script. This means the file
holding the standard output from nwchem will have a name that is
independent of the job scheduler used making easier to run the
json converter on it.
Moved the json conversion from the upload_output task to its own
task.
@cryos
Copy link

cryos commented Dec 8, 2016

@cjh1 yes, the way it is configured is documented in the ansible code we developed in the Phase I DOE project, mongochemdeploy. I want to move it to use PyBind11 soon, but you should be able to use this for now. It even has the workaround I used to find the right Python 3 on an Ubuntu host.

@chetnieter
Copy link
Member Author

One thing to keep in mind here is that we are going to have to run this JSON through another tool (Avogadro ) ( I will get the details ) which is a C++ application. We might want to look at how we could host some of these environments in a docker container which is what girder_worker seems todo

I am new Docker so I have some questions:

  • I assume that we would run the Docker container on the server.
  • This would mean we would have to copy the data from the cluster to the server.
  • Would we be pulling the image from Dockerhub?
  • I assume mounting a the server with the nwchem data would be the best way to give the container access to the data?

I can start working on a Dockerfile that creates an image with the json conversion script.

@cjh1
Copy link

cjh1 commented Dec 8, 2016

I am new Docker so I have some questions:

  • I assume that we would run the Docker container on the server.

Yes, when we say server we mean the machine or machines hosting the cumulus stack

  • This would mean we would have to copy the data from the cluster to the server.

Are we not already pull it off the cluster to upload it into GIrder. When is the conversion to JSON currently taking place?

  • Would we be pulling the image from Dockerhub?

Yes, when we have something working

  • I assume mounting a the server with the nwchem data would be the best way to give the container access to the data?

I would think just downloading in the container?

I can start working on a Dockerfile that creates an image with the json conversion script.

Sure, if you would like to experiment with this. Otherwise, we can punt on using a container and continue to install things directly on the server to get the end to end use case work i.e. build avogdralibs on the server.

Initial Dockerfile for container to hold the nwchem json converter
and eventually avogadro. I was able to run the json conversion by
running docker run with -v to mount the directory with the nwchem
output file.
Switched to running the json conversion script for nwchem in a Docker
container. The nwchem tasks now copies the output file to the server,
runs the docker container to generate the json, and then copies the
json output back to the cluster. Several things need to be cleaned
up like the hard-coded settings for the docker container.
Now passing in the nwchem output filename to the docker run command
rather than having it hard-coded in the docker image.
Some minor clean up including fixing a comment and removing a
python module that is not being used.

# Run docker container to post-process results - need to add docker image to upstream_result
command = ['docker', 'run', '--rm', '-v', '%s:/hpccloud' % tmp_dir,
'chetnieter/nwchem-postprocess', out_file]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we have a kitware account?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a kitware user on Dockerhub.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chetnieter I have created a hpccloud user account, we can use that for our images.

if p.returncode != 0:
print('Error running Docker container.')
print('STDOUT: ' + stdout)
print('STDERR: ' + stderr)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be logged to the task logger.

task.logger.error(stdout)
task.logger.error(stderr)

local_path = os.path.join(tmp_dir, out_file + '.json')
with get_connection(task.taskflow.girder_token, cluster) as conn:
with open(local_path, 'r') as local_fp:
conn.put(local_fp, cluster_path)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need the JSON back on the cluster, it should just be uploaded to Girder


RUN git clone https://github.com/wadejong/NWChemOutputToJson.git /opt/NWChemOutputToJson

ENTRYPOINT ["python", "/opt/NWChemOutputToJson/NWChemJsonConversion.py"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want this in a workflow specific path, we already have one for the taskflows, so may be:

server/docker/nwchem/Dockerfile ?


try:
# Copy the nwchem output to server
tmp_dir = tempfile.mkdtemp()
Copy link

@cjh1 cjh1 Dec 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be 'safer' to use a context manager here to ensure things get cleaned up.

with tempfile.TemporaryDirectory(...) as tmp_dir:
....

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chetnieter Sorry I didn't realize TemporaryDirectory was introduced in Python 3.

Sone changes from code review. This includes moving the docker file
to a more appropriate location and in a sub-folder that reflects
the associated taskflow. Also passing stderr and stdout from any
failed calls to docker in nwchem task flow. Using context manager
for temporary directory which adds dependency on backports module.
Uploading json output directly to girder rather than copying it
back to the cluster still needs to be done.
print('Error running Docker container.')
print('STDOUT: ' + stdout)
print('STDERR: ' + stderr)
task
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look like this is a typo?

local_path = os.path.join(tmp_dir, out_file + '.json')
with get_connection(task.taskflow.girder_token, cluster) as conn:
with open(local_path, 'r') as local_fp:
conn.put(local_fp, cluster_path)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chetnieter We should factor out the docker run code into a set of function that can be reused by other taskflows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants