WIP: Added json conversion of NWChem to workflow #557

chetnieter · 2016-12-07T19:44:45Z

Changes needed to add the conversion of the ascii output of NWChem to json format so it can be parsed to visualization with WebGL.

chetnieter · 2016-12-07T20:51:28Z

@cjh1 So this is basically working. I have added to the NWChem taskflow and it nows runs the json conversion script, generates the json file and uploads it to Girder along with the rest of the files. There are a couple of issues I think need to be worked out but they could wait until after we have added the ability to do visualization with WebGL. Here is a summary of them.

Since the relevant data is in the standard output of NWChem I made some assumptions about the filename that holds the output based on the results on ulmus. This is dependent on the job scheduler so right now it may only work for SGE.
I feel like we might want to rename the json file after it is generated since the conversion script just appends '.json' the output filename.
Right now the location of the conversion script is hard coded for its location on ulmus. This should probably be part of the cluster configuration. We could consider doing the conversion on the server by including the conversion module but that would require copying the output file from the cluster to the server.
I added the json conversion to the upload_output task. Do we want it to live in its own task?

Do you want me to address any of these issues or should we move on to getting the visualization working? I am thinking we may want to merge this work once we are happy with it since I am guessing the visualization will be more involved.

TristanWright · 2016-12-07T20:51:44Z

Can we keep TODO's out of code and file them as issues or items on issues even if temporary

codecov-io · 2016-12-07T21:01:07Z

Current coverage is 61.80% (diff: 100%)

Merging #557 into master will not change coverage

@@             master       #557   diff @@
==========================================
  Files            59         59          
  Lines          2788       2788          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits           1723       1723          
  Misses         1065       1065          
  Partials          0          0

Powered by Codecov. Last update 99ea1e8...a7bd2df

cjh1 · 2016-12-07T21:01:08Z

On Wed, Dec 7, 2016 at 3:51 PM, Chet Nieter ***@***.***> wrote: @cjh1 <https://github.com/cjh1> So this is basically working. I have added to the NWChem taskflow and it nows runs the json conversion script, generates the json file and uploads it to Girder along with the rest of the files. There are a couple of issues I think need to be worked out but they could wait until after we have added the ability to do visualization with WebGL. Here is a summary of them. - Since the relevant data is in the standard output of NWChem I made some assumptions about the filename that holds the output based on the results on ulmus. This is dependent on the job scheduler so right now it may only work for SGE.

We should resolve this, can you just add a redirect in the submission script. So something like nwchem input &> nwchem_output?

- I feel like we might want to rename the json file after it is generated since the conversion script just appends '.json' the output filename.

Sounds like a good idea

- Right now the location of the conversion script is hard coded for its location on ulmus. This should probably be part of the cluster configuration. We could consider doing the conversion on the server by including the conversion module but that would require copying the output file from the cluster to the server.

Is the conversion script just a single file? Could we just download it as part of the taskflow, if it doesn't already exist? Do we know what the license is? May be we could check it into our repo ....

- I added the json conversion to the upload_output task. Do we want it to live in its own task?

I think that probably make sense.

Do you want me to address any of these issues or should we move on to getting the visualization working? I am thinking we may want to merge this work once we are happy with it since I am guessing the visualization will be more involved.

I would say let resolve these issue before moving forward.

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#557 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA-7jVj03Ku-L_xiSBcT-Y30IwsRe87Nks5rFxxQgaJpZM4LHBRC> .

chetnieter · 2016-12-07T21:09:09Z

Can we keep TODO's out of code and file them as issues or items on issues even if temporary

TODO's removed, at least from the file I touched. List of issues now lives at #558

chetnieter · 2016-12-07T21:16:34Z

Is the conversion script just a single file? Could we just download it as
part of the taskflow, if it doesn't already exist? Do we know what the license is? May be
we could check it into our repo ....

It is two files. One holds a python module that does all the work and the other is just a simple python script the calls the module.

cjh1 · 2016-12-07T21:21:09Z

@chetnieter One thing to keep in mind here is that we are going to have to run this JSON through another tool (Avogadro ) ( I will get the details ) which is a C++ application. We might want to look at how we could host some of these environments in a docker container which is what girder_worker seems todo

cjh1 · 2016-12-07T21:31:42Z

We can reuse parts of this, calculate_mo(...) is the function we want. @cryos Does avogadrolibs master support python wrapping?

Added method to the NWChem workflow that calls the json conversion script. Currently calling it from the update_output method.

Added redirect to nwchem submission script. This means the file holding the standard output from nwchem will have a name that is independent of the job scheduler used making easier to run the json converter on it.

Moved the json conversion from the upload_output task to its own task.

cryos · 2016-12-08T19:28:51Z

@cjh1 yes, the way it is configured is documented in the ansible code we developed in the Phase I DOE project, mongochemdeploy. I want to move it to use PyBind11 soon, but you should be able to use this for now. It even has the workaround I used to find the right Python 3 on an Ubuntu host.

chetnieter · 2016-12-08T20:24:49Z

One thing to keep in mind here is that we are going to have to run this JSON through another tool (Avogadro ) ( I will get the details ) which is a C++ application. We might want to look at how we could host some of these environments in a docker container which is what girder_worker seems todo

I am new Docker so I have some questions:

I assume that we would run the Docker container on the server.
This would mean we would have to copy the data from the cluster to the server.
Would we be pulling the image from Dockerhub?
I assume mounting a the server with the nwchem data would be the best way to give the container access to the data?

I can start working on a Dockerfile that creates an image with the json conversion script.

cjh1 · 2016-12-08T20:37:34Z

I am new Docker so I have some questions:

I assume that we would run the Docker container on the server.

Yes, when we say server we mean the machine or machines hosting the cumulus stack

This would mean we would have to copy the data from the cluster to the server.

Are we not already pull it off the cluster to upload it into GIrder. When is the conversion to JSON currently taking place?

Would we be pulling the image from Dockerhub?

Yes, when we have something working

I assume mounting a the server with the nwchem data would be the best way to give the container access to the data?

I would think just downloading in the container?

I can start working on a Dockerfile that creates an image with the json conversion script.

Sure, if you would like to experiment with this. Otherwise, we can punt on using a container and continue to install things directly on the server to get the end to end use case work i.e. build avogdralibs on the server.

Initial Dockerfile for container to hold the nwchem json converter and eventually avogadro. I was able to run the json conversion by running docker run with -v to mount the directory with the nwchem output file.

Switched to running the json conversion script for nwchem in a Docker container. The nwchem tasks now copies the output file to the server, runs the docker container to generate the json, and then copies the json output back to the cluster. Several things need to be cleaned up like the hard-coded settings for the docker container.

Now passing in the nwchem output filename to the docker run command rather than having it hard-coded in the docker image.

Some minor clean up including fixing a comment and removing a python module that is not being used.

cjh1 · 2016-12-16T21:38:26Z

server/taskflows/hpccloud/taskflow/nwchem/__init__.py

+
+        # Run docker container to post-process results - need to add docker image to upstream_result
+        command = ['docker', 'run', '--rm', '-v', '%s:/hpccloud' % tmp_dir,
+                'chetnieter/nwchem-postprocess', out_file]


I wonder if we have a kitware account?

There is a kitware user on Dockerhub.

@chetnieter I have created a hpccloud user account, we can use that for our images.

cjh1 · 2016-12-16T21:38:29Z

server/taskflows/hpccloud/taskflow/nwchem/__init__.py

+        if p.returncode != 0:
+            print('Error running Docker container.')
+            print('STDOUT: ' + stdout)
+            print('STDERR: ' + stderr)


These should be logged to the task logger.

task.logger.error(stdout) task.logger.error(stderr)

cjh1 · 2016-12-16T21:39:28Z

server/taskflows/hpccloud/taskflow/nwchem/__init__.py

+        local_path = os.path.join(tmp_dir, out_file + '.json')
+        with get_connection(task.taskflow.girder_token, cluster) as conn:
+            with open(local_path, 'r') as local_fp:
+                conn.put(local_fp, cluster_path)


We don't need the JSON back on the cluster, it should just be uploaded to Girder

cjh1 · 2016-12-16T21:42:07Z

docker/Dockerfile

+
+RUN git clone https://github.com/wadejong/NWChemOutputToJson.git /opt/NWChemOutputToJson
+
+ENTRYPOINT ["python", "/opt/NWChemOutputToJson/NWChemJsonConversion.py"]


We probably want this in a workflow specific path, we already have one for the taskflows, so may be:

server/docker/nwchem/Dockerfile ?

cjh1 · 2016-12-16T21:45:19Z

server/taskflows/hpccloud/taskflow/nwchem/__init__.py

+
+    try:
+        # Copy the nwchem output to server
+        tmp_dir = tempfile.mkdtemp()


Might be 'safer' to use a context manager here to ensure things get cleaned up.

with tempfile.TemporaryDirectory(...) as tmp_dir: ....

@chetnieter Sorry I didn't realize TemporaryDirectory was introduced in Python 3.

Sone changes from code review. This includes moving the docker file to a more appropriate location and in a sub-folder that reflects the associated taskflow. Also passing stderr and stdout from any failed calls to docker in nwchem task flow. Using context manager for temporary directory which adds dependency on backports module. Uploading json output directly to girder rather than copying it back to the cluster still needs to be done.

cjh1 · 2017-01-02T14:20:30Z

server/taskflows/hpccloud/taskflow/nwchem/__init__.py

-            print('Error running Docker container.')
-            print('STDOUT: ' + stdout)
-            print('STDERR: ' + stderr)
+            task


Look like this is a typo?

cjh1 · 2017-01-03T13:57:24Z

server/taskflows/hpccloud/taskflow/nwchem/__init__.py

+        local_path = os.path.join(tmp_dir, out_file + '.json')
+        with get_connection(task.taskflow.girder_token, cluster) as conn:
+            with open(local_path, 'r') as local_fp:
+                conn.put(local_fp, cluster_path)


@chetnieter We should factor out the docker run code into a set of function that can be reused by other taskflows.

chetnieter added the WIP label Dec 7, 2016

Added method to call json conversion script.

cf24e7c

Added method to the NWChem workflow that calls the json conversion script. Currently calling it from the update_output method.

chetnieter force-pushed the nwchem-json-generation branch from 1ee58ac to cf24e7c Compare December 8, 2016 14:13

chetnieter added 2 commits December 8, 2016 10:02

Added redirect to nwchen submission script.

70e29e3

Added redirect to nwchem submission script. This means the file holding the standard output from nwchem will have a name that is independent of the job scheduler used making easier to run the json converter on it.

Moved json conversion to its own task.

dc65096

Moved the json conversion from the upload_output task to its own task.

chetnieter added 4 commits December 12, 2016 16:35

Initial Dockerfile for nwchem-json container.

1d35945

Initial Dockerfile for container to hold the nwchem json converter and eventually avogadro. I was able to run the json conversion by running docker run with -v to mount the directory with the nwchem output file.

Passing in nwchem output filename on command line.

e8a050c

Now passing in the nwchem output filename to the docker run command rather than having it hard-coded in the docker image.

Clean up before pausing work.

0407b9e

Some minor clean up including fixing a comment and removing a python module that is not being used.

cjh1 reviewed Dec 16, 2016

View reviewed changes

chetnieter added 2 commits December 18, 2016 15:14

Merge branch 'master' into nwchem-json-generation

c57d0ab

cjh1 reviewed Jan 2, 2017

View reviewed changes

cjh1 reviewed Jan 3, 2017

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Added json conversion of NWChem to workflow #557

WIP: Added json conversion of NWChem to workflow #557

chetnieter commented Dec 7, 2016

chetnieter commented Dec 7, 2016

TristanWright commented Dec 7, 2016

codecov-io commented Dec 7, 2016 •

edited

Loading

cjh1 commented Dec 7, 2016 via email •

edited

Loading

chetnieter commented Dec 7, 2016

chetnieter commented Dec 7, 2016

cjh1 commented Dec 7, 2016 •

edited

Loading

cjh1 commented Dec 7, 2016 •

edited

Loading

cryos commented Dec 8, 2016

chetnieter commented Dec 8, 2016

cjh1 commented Dec 8, 2016

cjh1 Dec 16, 2016

chetnieter Dec 16, 2016

cjh1 Jan 2, 2017

cjh1 Dec 16, 2016

cjh1 Dec 16, 2016

cjh1 Dec 16, 2016

cjh1 Dec 16, 2016 •

edited

Loading

cjh1 Jan 2, 2017

cjh1 Jan 2, 2017

cjh1 Jan 3, 2017


		RUN git clone https://github.com/wadejong/NWChemOutputToJson.git /opt/NWChemOutputToJson

		ENTRYPOINT ["python", "/opt/NWChemOutputToJson/NWChemJsonConversion.py"]

WIP: Added json conversion of NWChem to workflow #557

Are you sure you want to change the base?

WIP: Added json conversion of NWChem to workflow #557

Conversation

chetnieter commented Dec 7, 2016

chetnieter commented Dec 7, 2016

TristanWright commented Dec 7, 2016

codecov-io commented Dec 7, 2016 • edited Loading

Current coverage is 61.80% (diff: 100%)

cjh1 commented Dec 7, 2016 via email • edited Loading

chetnieter commented Dec 7, 2016

chetnieter commented Dec 7, 2016

cjh1 commented Dec 7, 2016 • edited Loading

cjh1 commented Dec 7, 2016 • edited Loading

cryos commented Dec 8, 2016

chetnieter commented Dec 8, 2016

cjh1 commented Dec 8, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjh1 Dec 16, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Dec 7, 2016 •

edited

Loading

cjh1 commented Dec 7, 2016 via email •

edited

Loading

cjh1 commented Dec 7, 2016 •

edited

Loading

cjh1 commented Dec 7, 2016 •

edited

Loading

cjh1 Dec 16, 2016 •

edited

Loading