Skip to content

Commit

Permalink
finished refactor of workflow examples to how tos
Browse files Browse the repository at this point in the history
  • Loading branch information
jrudz committed Jan 8, 2025
1 parent 3e2b9b4 commit 5fb8597
Show file tree
Hide file tree
Showing 19 changed files with 3,100 additions and 2,190 deletions.
5 changes: 5 additions & 0 deletions docs/examples/workflow_proof_of_concept.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
# 3 step linear proof of concept workflow

> See also: [Explanation > Workflows](../explanation/workflows.md)
This is not a working example, but rather more of a template to demonstrate the overall idea of the functionalities for generating NOMAD's custom workflow files, and to showcase some options which may not be used in the working examples.

```python
import gravis as gv
from nomad_utility_workflows.utils.workflows import build_nomad_workflow, nodes_to_graph
```

We have a workflow
```python
node_attributes = {
0: {
Expand Down
8 changes: 7 additions & 1 deletion docs/explanation/workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,17 @@ While effective, the creation of this yaml file requires some a priori knowledge

`nomad-utility-workflows` attempts to simplify the creation of the custom workflow yaml files by allowing users instead to supply a networkx graph with a set of minimal node attributes that are then used to create the appropriate connections within the yaml file automatically.

## NetworkX DiGraphs

networkx directed graph:
A networkx directed graph is instantiated as follows:
```python
import networkx as nx
workflow_graph = nx.DiGraph()
```

see [NetworkX Docs > DiGraph](https://networkx.org/documentation/stable/reference/classes/digraph.html) for more information.

## Node Attributes

The following attributes can be added to each node in the graph:

Expand Down Expand Up @@ -176,6 +180,8 @@ workflow_graph.add_edge(
)
```

## Generating the initial workflow graph

Alternatively, `nomad-utility-workflows` provides a functionality to automatically create an initial workflow graph automatically from a dictionary of node attributes as defined above with the function `node_to_attributes()`. In this case, the edges are specified with the following additional attributes (duplicate edges do not have an effect):

```python
Expand Down
296 changes: 296 additions & 0 deletions docs/how_to/add_custom_tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,296 @@
# How to add custom tasks to workflows using NOMAD's ELN Functionalities

This how-to covers how to add a custom NOMAD entry in the case that some tasks or input/output of your workflow, will not automatically parsed and stored within a NOMAD archive, i.e., there it cannot be referenced within your workflow.


## Example Overview

Consider the following setup, simulation, and analysis protocol:

![NOMAD workflow graph](images/solute_in_bilayer_workflow_graph_NOMAD.png){.screenshot}

The minimize, equilibrate, and production (workflow) tasks are analogous to that described in [How to > Create Custom Workflows](./create_custom_workflows.md). The remaining tasks (green boxes) correspond to steps in the simulation protocol that are not supported by the NOMAD simulation parsers, e.g., creation of the initial configuration or model parameter files, or post-simulation analysis.

## Create an ELN entry with ElnBaseSection

The basic strategy described here is to use NOMAD's existing ELN functionality to add (meta)data for the "non-recognized" steps of the workflow. Here we only provide a description quantity, but more advanced support is available, see [NOMAD Docs > How to > Use ELNs](https://nomad-lab.eu/prod/v1/docs/howto/manage/eln.html){:target="_blank"}.

Before we can add these as tasks in the workflow, we need to create mainfiles that NOMAD recognizes with the corresponding metadata. We can achieve this by creating an archive.yaml file as follows:

```yaml
# insert_solute_in_box.archive.yaml
data:
m_def: nomad.datamodel.metainfo.eln.ElnBaseSection
name: 'insert_solute_in_box'
description: 'This is a description of the method performed to insert the solute into the simulation box...'
```
This utilizes the `ELNBaseSection` class to create the following overview page upon upload:

![NOMAD workflow graph](images/ELN_overview_page.png){.screenshot}

## Link the ELN entries to your workflow

Now that we have a mainfile for each task, we can specify the graph strucuture and node attributes as described in the [Create Custom Workflows > Create an input graph with nodes_to_graph()](./create_custom_workflows.md#create-an-input-graph-with-nodes_to_graph):

```python
path_to_job = ''
node_attributes = {
0: {'name': 'Solute in bilayer workflow parameters',
'type': 'input',
'entry_type': 'other',
'path_info': {
'archive_path': 'data',
'mainfile_path': f'{path_to_job}workflow_parameters.archive.yaml',
},
'out_edge_nodes': [1, 3],
},
1: {'name': 'insert_solute_in_box',
'type': 'task',
'entry_type': 'other',
'path_info': {
'mainfile_path': f'{path_to_job}insert_solute_in_box.archive.yaml',
'archive_path': 'data',
},
'inputs': [
{
'name': 'data from workflow parameters',
'path_info': {
'archive_path': 'data',
'mainfile_path': f'{path_to_job}workflow_parameters.archive.yaml',
},
}
],
'outputs': [
{
'name': 'data from insert_solute_in_box',
'path_info': {
'archive_path': 'data',
'mainfile_path': f'{path_to_job}insert_solute_in_box.archive.yaml',
},
}
],
},
2: {'name': 'convert_box_to_gro',
'type': 'task',
'entry_type': 'other',
'path_info': {
'mainfile_path': f'{path_to_job}convert_box_to_gro.archive.yaml'
},
'in_edge_nodes': [1],
'inputs': [
{
'name': 'data from insert_solute_in_box',
'path_info': {
'archive_path': 'data',
'mainfile_path': f'{path_to_job}insert_solute_in_box.archive.yaml',
},
}
],
'outputs': [
{
'name': 'data from convert_box_to_gro',
'path_info': {
'archive_path': 'data',
'mainfile_path': f'{path_to_job}convert_box_to_gro.archive.yaml'
},
}
],
},
3: {'name': 'update_topology_file',
'type': 'task',
'entry_type': 'other',
'path_info': {
'mainfile_path': f'{path_to_job}update_topology_file.archive.yaml'
},
'inputs': [
{
'name': 'data from workflow parameters',
'path_info': {
'archive_path': 'data',
'mainfile_path': f'{path_to_job}workflow_parameters.archive.yaml',
},
}
],
'outputs': [
{
'name': 'data from update_topology_file',
'path_info': {
'archive_path': 'data',
'mainfile_path': f'{path_to_job}update_topology_file.archive.yaml'
},
}
],
},
4: {'name': 'minimize',
'type': 'workflow',
'entry_type': 'simulation',
'path_info': {
'mainfile_path': f'{path_to_job}solute_in_bilayer_minimize.log'
},
'in_edge_nodes': [2, 3],
'inputs': [
{
'name': 'data from convert_box_to_gro',
'path_info': {
'archive_path': 'data',
'mainfile_path': f'{path_to_job}convert_box_to_gro.archive.yaml',
},
},
{
'name': 'data from update_topology_file',
'path_info': {
'archive_path': 'data',
'mainfile_path': f'{path_to_job}update_topology_file.archive.yaml',
},
}
],
},
5: {'name': 'equilibrate',
'type': 'workflow',
'entry_type': 'simulation',
'path_info': {
'mainfile_path': f'{path_to_job}solute_in_bilayer_equilibrate.log'
},
'in_edge_nodes': [4],
},
6: {'name': 'production',
'type': 'workflow',
'entry_type': 'simulation',
'path_info': {
'mainfile_path': f'{path_to_job}solute_in_bilayer_production.log'
},
'in_edge_nodes': [5],
},
7: {'name': 'compute_wham',
'type': 'task',
'entry_type': 'other',
'path_info': {
'mainfile_path': f'{path_to_job}compute_wham.archive.yaml'
},
'in_edge_nodes': [6],
'outputs': [
{
'name': 'data from compute_wham',
'path_info': {
'archive_path': 'data',
'mainfile_path': f'{path_to_job}compute_wham.archive.yaml'
},
}
],
},
}
```

## Generate the input workflow graph and workflow yaml

Identically to `Create Custom Workflows >` [Create an input graph with nodes_to_graph()](./create_custom_workflows.md#create-an-input-graph-with-nodes_to_graph) and [Generate the workflow yaml](./create_custom_workflows.md#generate-the-workflow-yaml), we simply apply the `node_to_graph()` and `build_nomad_workflow()` functions:

```python
workflow_graph_input = nodes_to_graph(node_attributes)
workflow_metadata = {
'destination_filename': 'solute_in_bilayer.workflow.archive.yaml',
'workflow_name': 'Solute in bilayer workflow',
}
workflow_graph_output_minimal = build_nomad_workflow(
workflow_metadata=workflow_metadata,
workflow_graph=nx.DiGraph(workflow_graph_input_minimal),
write_to_yaml=True,
)
```

which produces the following workflow yaml:


```yaml
'workflow2':
'name': 'Solute in bilayer workflow'
'inputs':
- 'name': 'Solute in bilayer workflow parameters'
'section': '../upload/archive/mainfile/workflow_parameters.archive.yaml#/data'
'outputs':
- 'name': 'data from compute_wham'
'section': '../upload/archive/mainfile/compute_wham.archive.yaml#/data'
'tasks':
- 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
'name': 'insert_solute_in_box'
'task': '../upload/archive/mainfile/insert_solute_in_box.archive.yaml#/data'
'inputs':
- 'name': 'data from workflow parameters'
'section': '../upload/archive/mainfile/workflow_parameters.archive.yaml#/data'
'outputs':
- 'name': 'data from insert_solute_in_box'
'section': '../upload/archive/mainfile/insert_solute_in_box.archive.yaml#/data'
- 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
'name': 'convert_box_to_gro'
'inputs':
- 'name': 'data from insert_solute_in_box'
'section': '../upload/archive/mainfile/insert_solute_in_box.archive.yaml#/data'
'outputs':
- 'name': 'data from convert_box_to_gro'
'section': '../upload/archive/mainfile/convert_box_to_gro.archive.yaml#/data'
- 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
'name': 'update_topology_file'
'inputs':
- 'name': 'data from workflow parameters'
'section': '../upload/archive/mainfile/workflow_parameters.archive.yaml#/data'
'outputs':
- 'name': 'data from update_topology_file'
'section': '../upload/archive/mainfile/update_topology_file.archive.yaml#/data'
- 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
'name': 'minimize'
'task': '../upload/archive/mainfile/solute_in_bilayer_minimize.log#/workflow2'
'inputs':
- 'name': 'data from convert_box_to_gro'
'section': '../upload/archive/mainfile/convert_box_to_gro.archive.yaml#/data'
- 'name': 'input system from minimize'
'section': '../upload/archive/mainfile/solute_in_bilayer_minimize.log#/run/0/system/-1'
- 'name': 'data from update_topology_file'
'section': '../upload/archive/mainfile/update_topology_file.archive.yaml#/data'
'outputs':
- 'name': 'output system from minimize'
'section': '../upload/archive/mainfile/solute_in_bilayer_minimize.log#/run/0/system/-1'
- 'name': 'output calculation from minimize'
'section': '../upload/archive/mainfile/solute_in_bilayer_minimize.log#/run/0/calculation/-1'
- 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
'name': 'equilibrate'
'task': '../upload/archive/mainfile/solute_in_bilayer_equilibrate.log#/workflow2'
'inputs':
- 'name': 'input system from minimize'
'section': '../upload/archive/mainfile/solute_in_bilayer_minimize.log#/run/0/system/-1'
'outputs':
- 'name': 'output system from equilibrate'
'section': '../upload/archive/mainfile/solute_in_bilayer_equilibrate.log#/run/0/system/-1'
- 'name': 'output calculation from equilibrate'
'section': '../upload/archive/mainfile/solute_in_bilayer_equilibrate.log#/run/0/calculation/-1'
- 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
'name': 'production'
'task': '../upload/archive/mainfile/solute_in_bilayer_production.log#/workflow2'
'inputs':
- 'name': 'input system from equilibrate'
'section': '../upload/archive/mainfile/solute_in_bilayer_equilibrate.log#/run/0/system/-1'
'outputs':
- 'name': 'output system from production'
'section': '../upload/archive/mainfile/solute_in_bilayer_production.log#/run/0/system/-1'
- 'name': 'output calculation from production'
'section': '../upload/archive/mainfile/solute_in_bilayer_production.log#/run/0/calculation/-1'
- 'm_def': 'nomad.datamodel.metainfo.workflow.TaskReference'
'name': 'compute_wham'
'inputs':
- 'name': 'input system from production'
'section': '../upload/archive/mainfile/solute_in_bilayer_production.log#/run/0/system/-1'
'outputs':
- 'name': 'data from compute_wham'
'section': '../upload/archive/mainfile/compute_wham.archive.yaml#/data'
```

and when uploaded with the corresponding simulation files and ELN `archive.yaml`'s will produce the workflow visualization at the top of this page.
4 changes: 0 additions & 4 deletions docs/how_to/contribute_to_the_documentation.md

This file was deleted.

5 changes: 0 additions & 5 deletions docs/how_to/contribute_to_this_plugin.md

This file was deleted.

Loading

0 comments on commit 5fb8597

Please sign in to comment.