From 5c46fb2e6d02bbf355b51c303514654dd118ffa7 Mon Sep 17 00:00:00 2001 From: fraser-combe Date: Tue, 19 Nov 2024 10:37:49 -0600 Subject: [PATCH 1/3] Update code contribution guidelines --- docs/contributing/code_contribution.md | 212 ++++++++++++++++++------- 1 file changed, 151 insertions(+), 61 deletions(-) diff --git a/docs/contributing/code_contribution.md b/docs/contributing/code_contribution.md index cb7ba5727..3483bf1c2 100644 --- a/docs/contributing/code_contribution.md +++ b/docs/contributing/code_contribution.md @@ -8,8 +8,10 @@ Style guide inspired by Scott Frazer’s [WDL Best Practices Style Guide](http ## General Guidelines -- Put tasks and workflows in separate files in the appropriate folders. -- Always add a description as metadata +***Modularity and Metadata*** + +- **Best Practice:** Place tasks and workflows in separate files to maintain modularity and clarity. +- **Add a `meta` block** to every task and workflow to provide a brief description of its purpose. ```bash meta { @@ -17,113 +19,145 @@ Style guide inspired by Scott Frazer’s [WDL Best Practices Style Guide](http } ``` -- Ensure that the docker container is locked to a given version, not `latest` +***Docker Containers*** + +- Use a specific Docker container version instead of 'latest' to ensure reproducibility and prevent unexpected changes in container behavior. ```bash String docker = "quay.io/docker_image:version" ``` - Preferentially use containers [`Google's Artifact Registry`](https://console.cloud.google.com/artifacts/docker/general-theiagen/us) rather than those from [`quay.io`](http://quay.io) or [`dockerhub`](https://hub.docker.com/) -- Use 2-space indents (no tabs) + +***Indentation and Whitespace*** + +- Use 2-space indentation for all blocks. Avoid using tabs to ensure uniform formatting across editors: ```bash # perform action - if [ this ]; then - action1(variable) + if [ condition ]; then + perform_action(variable) fi ``` -- Do not use line break for opening braces +- Use a single space when defining variables (`this = that` *not* `this= that` (unless a bash variable where `this=that` is required)) + +***Bracket and Spacing Conventions*** + +- Avoid line breaks for opening braces. Keep them on the same line as the declaration. i.e `input {` instead of `input\n{` + + ```bash + # Correct + input { + String input_variable + } + + # Incorrect + input + { + String input_variable + } + ``` + - Use single space when defining input/output variables & runtime attributes (`output {` instead of `output{`) -- Use single-line breaks between non-intended constructs -- Enclose task commands with triple angle brackets (`<<< ... >>>`) -- Consistently use white space with variables (`this = that` *not* `this= that` (unless a bash variable where `this=that` is required)) +- Separate non-indented constructs (like input and output sections) with a single-line break for readability. + +***Command Block Syntax*** + +- Enclose command blocks in triple angle brackets (<<< ... >>>) for consistency and easier handling of multi-line scripts. It also avoids issues with unescaped special characters in the command block: + + ```bash + command <<< + tool --input ~{input} --output ~{output} + >>> + ``` ## Task Blocks -The task should contain the following sections. Include _single_ spaces between input, command, output, and runtime closing and opening curly brackets. +A WDL task block defines a discrete, reusable step in a workflow. To ensure readability and consistency, follow these conventions when writing task blocks. Include single spaces between the input, command, output, and runtime sections and their enclosing curly brackets. ```bash -input { +task example_task { + input { -} -command <<< - ->>> -output { + } + command <<< + + >>> + output { -} -runtime { + } + runtime { + } } ``` ??? toggle "`input` block" - The following conventions are used to expose docker, CPU, memory, and disk size - ```bash - input { - String docker = "..." - Int cpu = x - Int memory = y - Int disk_size = z - } - ``` - - - If additional arguments should be allowed to be passed to the task, this input should follow the convention below: - + ```bash + input { + String docker = "quay.io/example:1.0.0" # Docker container for the task + Int cpu = 4 # Number of CPUs + Int memory = 16 # Memory in GB + Int disk_size = 100 # Disk space in GB + } + ``` + + - If the task accepts additional optional parameters, include an args field. ```bash input { String args = "" } ``` - - Input and output lists should not be formatted to have the equal sign aligned, but instead use a single space before and after the `=` ```bash - output1_x = string1 - output2_that_does_y = string2 + correct_output = "output_file" + long_variable_name = "long_file_name" ``` - - Ensure the docker container is exposed as an input and as an output string + - Expose Docker as an input and runtime variable: ```bash input { - String docker = "" + String docker = "quay.io/example:1.0.0" } ... output { - String XX_docker = docker + String used_docker = docker } runtime { - docker: docker + docker: docker } ``` ??? toggle "`command` block" - Ensure use of line breaks between different sections of code to improve readability - + ```bash - # if this, perform action 1 - if [ this ]; then + # Perform task step 1 + if [ condition ]; then action1(variable) fi - # if that, perform action 2 - if [ that ]; then + # Perform task step 2 + if [ another_condition ]; then action2(variable) fi ``` - Split command calls into multiple lines if they have user input variables and/or if the length of the command is very long to avoid text wrapping and/or side-scrolling, e.g. - - Use indentation as appropriate + - Use backslashes for continuation and indentation to clarify structure: ```bash tool \ + --input ~{input_file} \ + --output ~{output_file} \ --option1 ~{option1} \ - --option2 ~{option2} \ ... - --option999 ~{option999} + --optionN ~{optionN} ``` - Add comments that @@ -137,41 +171,97 @@ runtime { ``` ??? toggle "`output` block" - - File types should be clearly stated in the output name variables + - The output block specifies the files or variables produced by the task. Follow these conventions: - ```bash - output1_csv = file1.csv - output2_tsv = file2.tsv - ``` + ```bash + output { + File result_csv = "output.csv" # CSV file generated + File result_log = "log.txt" # Log file + String + } + ``` - Ensure the docker container is exposed as an output string, e.g. ```bash input { - String docker + String docker = "us-docker.pkg.dev/general-theiagen/tool:version" } ... output { - String XX_docker = docker + String XX_docker = docker } runtime { - docker: docker + docker: docker } ``` ??? toggle "`runtime` block" - - Always use a docker container + - The runtime block defines the compute resources and environment for the task. + - Always specify a Docker: + + ```bash + runtime { + docker: docker + cpu: cpu + memory: memory + disk: disk_size + } + ``` ## Workflow Blocks -The workflow/sub-workflow file should contain: +A WDL workflow block orchestrates the execution of tasks and subworkflows. It defines the inputs, calls tasks or subworkflows, and specifies the final outputs. To ensure readability and consistency, follow these conventions when writing workflow blocks: + +### General Guidelines + +- Include a block of `import` statements (sorted in alphabetical order). + - When a workflow imports a task, ensure it is imported under a unique name to avoid conflicts. + +```bash +import "../tasks/task_task1.wdl" as task1_task +import "../tasks/task_task2.wdl" as task2_task +``` + +- A `workflow` block with: + +- An `input` section: + + ```bash + input { + String input + String task1_docker = "us-docker.pkg.dev/general-theiagen/tool:version" + String? task1_optional_argument + } + ``` + +- `call` sections for specified tasks: + + ```bash + call task1_task.task1 { + input: + input = input, + docker = task1_docker + } + ``` + +- An `output` section: + + - Define all workflow outputs in this section. + - Use descriptive names for each output variable. + + ```bash + output { + # Task 1 outputs + File task1_out_csv = task1.output_csv + String task1_version = task1.version + + # Subworkflow outputs + File subworkflow_out_tsv = subworkflow.task3_out_tsv + String subworkflow_version = subworkflow.task3_version + } + ``` -- a block of `import` statements (alphabetical order), - - When a workflow imports a task, make sure that it is imported under a different name than the task it is calling -- a `workflow` block with - - an `input` section - - `call` sections for specified tasks - - an `output` section Example formatting is shown below. From 69f87290c79bfdb7e72f719de1a6980be444ced1 Mon Sep 17 00:00:00 2001 From: fraser-combe Date: Tue, 19 Nov 2024 10:50:42 -0600 Subject: [PATCH 2/3] remove extra string word --- docs/contributing/code_contribution.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/contributing/code_contribution.md b/docs/contributing/code_contribution.md index 3483bf1c2..7de6a769d 100644 --- a/docs/contributing/code_contribution.md +++ b/docs/contributing/code_contribution.md @@ -177,7 +177,6 @@ task example_task { output { File result_csv = "output.csv" # CSV file generated File result_log = "log.txt" # Log file - String } ``` From 5e1e234383752248da2356b24ec42951dadede9b Mon Sep 17 00:00:00 2001 From: Sage Wright Date: Mon, 25 Nov 2024 19:58:52 +0000 Subject: [PATCH 3/3] edit & add --- docs/contributing/code_contribution.md | 347 ++++++++++++------------- 1 file changed, 172 insertions(+), 175 deletions(-) diff --git a/docs/contributing/code_contribution.md b/docs/contributing/code_contribution.md index 7de6a769d..d5819b9d3 100644 --- a/docs/contributing/code_contribution.md +++ b/docs/contributing/code_contribution.md @@ -24,7 +24,7 @@ Style guide inspired by Scott Frazer’s [WDL Best Practices Style Guide](http - Use a specific Docker container version instead of 'latest' to ensure reproducibility and prevent unexpected changes in container behavior. ```bash - String docker = "quay.io/docker_image:version" + String docker = "us-docker.pkg.dev/docker_image:version" ``` - Preferentially use containers [`Google's Artifact Registry`](https://console.cloud.google.com/artifacts/docker/general-theiagen/us) rather than those from [`quay.io`](http://quay.io) or [`dockerhub`](https://hub.docker.com/) @@ -93,176 +93,188 @@ task example_task { } ``` -??? toggle "`input` block" - - The following conventions are used to expose docker, CPU, memory, and disk size +### The `input` block - ```bash - input { - String docker = "quay.io/example:1.0.0" # Docker container for the task - Int cpu = 4 # Number of CPUs - Int memory = 16 # Memory in GB - Int disk_size = 100 # Disk space in GB - } - ``` +- The following conventions are used to expose docker, CPU, memory, and disk size: - - If the task accepts additional optional parameters, include an args field. - ```bash - input { - String args = "" - } - ``` - - Input and output lists should not be formatted to have the equal sign aligned, but instead use a single space before and after the `=` - - ```bash - correct_output = "output_file" - long_variable_name = "long_file_name" - ``` - - - Expose Docker as an input and runtime variable: - - ```bash - input { - String docker = "quay.io/example:1.0.0" - } - ... - output { - String used_docker = docker - } - runtime { - docker: docker - } - ``` + ```bash + input { + Int cpu = 4 # Number of CPUs + Int disk_size = 100 # Disk space in GB + String docker = "us-docker.pkg.dev/example:1.0.0" # Docker container for the task + Int memory = 16 # Memory in GB + } + ``` -??? toggle "`command` block" - - Ensure use of line breaks between different sections of code to improve readability +- Include optional tool parameters as inputs to the task - ```bash - # Perform task step 1 - if [ condition ]; then - action1(variable) - fi - - # Perform task step 2 - if [ another_condition ]; then - action2(variable) - fi - ``` - - - Split command calls into multiple lines if they have user input variables and/or if the length of the command is very long to avoid text wrapping and/or side-scrolling, e.g. - - Use backslashes for continuation and indentation to clarify structure: - - ```bash - tool \ - --input ~{input_file} \ - --output ~{output_file} \ - --option1 ~{option1} \ - ... - --optionN ~{optionN} - ``` - - - Add comments that - - Explain what the optional parameters are - - Provide links to the tool documentation so future readers of the code know where to find that information - - Explain what non-intuitive bash/python text wrangling actions do, e.g. - - ```bash - ## awk for gene column ($6) to grab subtype ($15) - cat ~{file} | awk -F '\t' '{if ($6=="M1") print $15}' > FLU_TYPE - ``` - -??? toggle "`output` block" - - The output block specifies the files or variables produced by the task. Follow these conventions: - - ```bash - output { - File result_csv = "output.csv" # CSV file generated - File result_log = "log.txt" # Log file - } - ``` - - - Ensure the docker container is exposed as an output string, e.g. - - ```bash - input { - String docker = "us-docker.pkg.dev/general-theiagen/tool:version" - } - ... - output { - String XX_docker = docker - } - runtime { - docker: docker - } - ``` + ```bash + input { + Int? optional_tool_parameter1 + String optional_tool_parameter2_with_default = "default_value" + } + ``` -??? toggle "`runtime` block" - - The runtime block defines the compute resources and environment for the task. - - Always specify a Docker: +- Input and output lists should **not** be formatted to have the equal sign aligned, but instead **use a single space** before and after the `=` - ```bash - runtime { - docker: docker - cpu: cpu - memory: memory - disk: disk_size - } - ``` + ```bash + correct_output = "output_file" + long_variable_name = "long_file_name" + ``` -## Workflow Blocks +- Expose Docker as an input, an output (if versioning information not available), and runtime variable: -A WDL workflow block orchestrates the execution of tasks and subworkflows. It defines the inputs, calls tasks or subworkflows, and specifies the final outputs. To ensure readability and consistency, follow these conventions when writing workflow blocks: + ```bash + input { + String docker = "us-docker.pkg.dev/example:1.0.0" + } + ... + output { + String used_docker = docker + } + runtime { + docker: docker + } + ``` -### General Guidelines +### The `command` block -- Include a block of `import` statements (sorted in alphabetical order). - - When a workflow imports a task, ensure it is imported under a unique name to avoid conflicts. +- Ensure use of line breaks between different sections of code to improve readability -```bash -import "../tasks/task_task1.wdl" as task1_task -import "../tasks/task_task2.wdl" as task2_task -``` + ```bash + # Perform task step 1 + if [ condition ]; then + action1(variable) + fi + + # Perform task step 2 + if [ another_condition ]; then + action2(variable) + fi + ``` -- A `workflow` block with: +- Split command calls into multiple lines if they have user input variables and/or if the length of the command is very long to avoid text wrapping and/or side-scrolling, e.g. + - Use backslashes for continuation and indentation to clarify structure: -- An `input` section: + ```bash + tool \ + --input ~{input_file} \ + --output ~{output_file} \ + --option1 ~{option1} \ + ... + --optionN ~{optionN} + ``` - ```bash - input { - String input - String task1_docker = "us-docker.pkg.dev/general-theiagen/tool:version" - String? task1_optional_argument - } - ``` +- Add comments that + - Explain what the optional parameters are + - Provide links to the tool documentation so future readers of the code know where to find that information + - Explain what non-intuitive bash/python text wrangling actions do, e.g. -- `call` sections for specified tasks: + ```bash + ## awk for gene column ($6) to grab subtype ($15) + cat ~{file} | awk -F '\t' '{if ($6=="M1") print $15}' > FLU_TYPE + ``` + +### The `output` block + +- The output block specifies the files or variables produced by the task. Follow these conventions: ```bash - call task1_task.task1 { - input: - input = input, - docker = task1_docker + output { + File result_csv = "output.csv" # CSV file generated + File result_log = "log.txt" # Log file } ``` -- An `output` section: - - - Define all workflow outputs in this section. - - Use descriptive names for each output variable. +- Ensure the docker container is exposed as an output string, e.g. ```bash + input { + String docker = "us-docker.pkg.dev/general-theiagen/tool:version" + } + ... output { - # Task 1 outputs - File task1_out_csv = task1.output_csv - String task1_version = task1.version + String XX_docker = docker + } + runtime { + docker: docker + } + ``` + +### The `runtime` block + +- The runtime block defines the compute resources and environment for the task. +- Always specify a Docker: - # Subworkflow outputs - File subworkflow_out_tsv = subworkflow.task3_out_tsv - String subworkflow_version = subworkflow.task3_version + ```bash + runtime { + docker: docker + cpu: cpu + memory: memory + disk: disk_size } ``` +## Workflow Blocks + +A WDL workflow block orchestrates the execution of tasks and subworkflows. It defines the inputs, calls tasks or subworkflows, and specifies the final outputs. To ensure readability and consistency, follow these conventions when writing workflow blocks: + +### The `import` section + +- Include a block of `import` statements (sorted in alphabetical order). + - When a workflow imports a task, ensure it is imported under a unique name to avoid conflicts. + + ```bash + import "../tasks/task_task1.wdl" as task1_task + import "../tasks/task_task2.wdl" as task2_task + ``` + +- Order import statements alphabetically by the path of the imported file. + +### The `input` block + +- Optional inputs that should be able to be edited by the user, such as docker containers should be exposed on the workflow level as in the example +- In the case of subworkflows, all optional inputs should be exposed on the workflow level so that they can be modified by users on Terra + +```bash +input { + String input + String task1_docker = "us-docker.pkg.dev/general-theiagen/tool:version" + String? task1_optional_argument +} +``` + +### The `call` sections + +- Import task files as something other than the included task nam in order to avoid namespace conflicts + +```bash +call task1_task.task1 { + input: + input = input, + docker = task1_docker +} +``` + +### The `output` block + +- Define all workflow outputs in this section. +- Use descriptive names for each output variable. +- Order outputs alphabetically by the name of the output variable + +```bash +output { + # Task 1 outputs + File task1_out_csv = task1.output_csv + String task1_version = task1.version + + # Subworkflow outputs + File subworkflow_out_tsv = subworkflow.task3_out_tsv + String subworkflow_version = subworkflow.task3_version +} +``` -Example formatting is shown below. +## Example Workflow formats ??? toggle "wf_example_wf.wdl" @@ -279,7 +291,6 @@ Example formatting is shown below. String task2_docker = "us-docker.pkg.dev/general-theiagen//task_2:version" String? hidden_task3_argument String? hidden_task3_docker - String? hidden_task4_argument String? hidden_task4_docker } call task1_task.task1 { @@ -294,7 +305,10 @@ Example formatting is shown below. } call subworkflow.subworkflow { input: - input = input + input = input, + task3_argument = hidden_task3_argument, + task3_docker = hidden_task3_docker + task4_docker = hidden_task4_docker } output { # Task 1 outputs @@ -305,16 +319,19 @@ Example formatting is shown below. File task2_out_tsv = task2.output_tsv String task2_version = task2.version String task2_docker = task2.docker - # Subworkflow outputs + # Subworkflow outputs for task 3 File task3_out_tsv = subworkflow.task3_out_tsv String task3_version = subworkflow.task3_version String task3_docker = subworkflow.task3_docker + # Subworkflow outputs for task 4 + String task4_output = subworkflow.task4_output + String task4_version = subworkflow.task4_version } } ``` - ??? toggle "wf_subworkflow.wdl" + ```bash import "../tasks/task_task3.wdl" as task3_task import "../tasks/task_task4.wdl" as task4_task @@ -328,6 +345,7 @@ Example formatting is shown below. # level so they can be modified by a Terra user String? task3_argument String? task3_docker + String? task4_docker } call task3_task.task3 { input: @@ -335,38 +353,17 @@ Example formatting is shown below. args = task3_argument, docker = task3_docker } + call task4_task.task4 { + input: + input = task3.output_tsv, + docker = task4_docker + } output { File task3_out_tsv = task3.output_tsv String task3_version = task3.version String task3_docker = task3.docker + String task4_output = task4.output + String task4_version = task4.version } } ``` - ---- - -??? toggle "`input` section" - - Optional inputs that should be able to be edited by the user, such as docker containers should be exposed on the workflow level as in the example - - In the case of subworkflows, all optional inputs should be exposed on the workflow level so that they can be modified by users on Terra - -??? toggle "`call` task sections" - - There should be no blank lines between tasks in workflows - - ```bash - task A { - } - task B { - } - ``` - - - Label a group of outputs by the source/species for organizational purposes when a workflow has many different outputs - - ```ebnf - output { - ... - # task99 outputs - String task99_ouput - String task99_file - ... - } - ```