diff --git a/docs/contributing/code_contribution.md b/docs/contributing/code_contribution.md index cb7ba5727..d5819b9d3 100644 --- a/docs/contributing/code_contribution.md +++ b/docs/contributing/code_contribution.md @@ -8,8 +8,10 @@ Style guide inspired by Scott Frazer’s [WDL Best Practices Style Guide](http ## General Guidelines -- Put tasks and workflows in separate files in the appropriate folders. -- Always add a description as metadata +***Modularity and Metadata*** + +- **Best Practice:** Place tasks and workflows in separate files to maintain modularity and clarity. +- **Add a `meta` block** to every task and workflow to provide a brief description of its purpose. ```bash meta { @@ -17,163 +19,262 @@ Style guide inspired by Scott Frazer’s [WDL Best Practices Style Guide](http } ``` -- Ensure that the docker container is locked to a given version, not `latest` +***Docker Containers*** + +- Use a specific Docker container version instead of 'latest' to ensure reproducibility and prevent unexpected changes in container behavior. ```bash - String docker = "quay.io/docker_image:version" + String docker = "us-docker.pkg.dev/docker_image:version" ``` - Preferentially use containers [`Google's Artifact Registry`](https://console.cloud.google.com/artifacts/docker/general-theiagen/us) rather than those from [`quay.io`](http://quay.io) or [`dockerhub`](https://hub.docker.com/) -- Use 2-space indents (no tabs) + +***Indentation and Whitespace*** + +- Use 2-space indentation for all blocks. Avoid using tabs to ensure uniform formatting across editors: ```bash # perform action - if [ this ]; then - action1(variable) + if [ condition ]; then + perform_action(variable) fi ``` -- Do not use line break for opening braces +- Use a single space when defining variables (`this = that` *not* `this= that` (unless a bash variable where `this=that` is required)) + +***Bracket and Spacing Conventions*** + +- Avoid line breaks for opening braces. Keep them on the same line as the declaration. i.e `input {` instead of `input\n{` + + ```bash + # Correct + input { + String input_variable + } + + # Incorrect + input + { + String input_variable + } + ``` + - Use single space when defining input/output variables & runtime attributes (`output {` instead of `output{`) -- Use single-line breaks between non-intended constructs -- Enclose task commands with triple angle brackets (`<<< ... >>>`) -- Consistently use white space with variables (`this = that` *not* `this= that` (unless a bash variable where `this=that` is required)) +- Separate non-indented constructs (like input and output sections) with a single-line break for readability. + +***Command Block Syntax*** + +- Enclose command blocks in triple angle brackets (<<< ... >>>) for consistency and easier handling of multi-line scripts. It also avoids issues with unescaped special characters in the command block: + + ```bash + command <<< + tool --input ~{input} --output ~{output} + >>> + ``` ## Task Blocks -The task should contain the following sections. Include _single_ spaces between input, command, output, and runtime closing and opening curly brackets. +A WDL task block defines a discrete, reusable step in a workflow. To ensure readability and consistency, follow these conventions when writing task blocks. Include single spaces between the input, command, output, and runtime sections and their enclosing curly brackets. ```bash -input { +task example_task { + input { -} -command <<< + } + command <<< + + >>> + output { ->>> -output { - -} -runtime { + } + runtime { + } } ``` -??? toggle "`input` block" - - The following conventions are used to expose docker, CPU, memory, and disk size +### The `input` block - ```bash - input { - String docker = "..." - Int cpu = x - Int memory = y - Int disk_size = z - } - ``` - - - If additional arguments should be allowed to be passed to the task, this input should follow the convention below: - - ```bash - input { - String args = "" - } - ``` - - - Input and output lists should not be formatted to have the equal sign aligned, but instead use a single space before and after the `=` - - ```bash - output1_x = string1 - output2_that_does_y = string2 - ``` - - - Ensure the docker container is exposed as an input and as an output string - - ```bash - input { - String docker = "" - } - ... - output { - String XX_docker = docker - } - runtime { - docker: docker - } - ``` +- The following conventions are used to expose docker, CPU, memory, and disk size: -??? toggle "`command` block" - - Ensure use of line breaks between different sections of code to improve readability - - ```bash - # if this, perform action 1 - if [ this ]; then - action1(variable) - fi - - # if that, perform action 2 - if [ that ]; then - action2(variable) - fi - ``` - - - Split command calls into multiple lines if they have user input variables and/or if the length of the command is very long to avoid text wrapping and/or side-scrolling, e.g. - - Use indentation as appropriate - - ```bash - tool \ - --option1 ~{option1} \ - --option2 ~{option2} \ - ... - --option999 ~{option999} - ``` - - - Add comments that - - Explain what the optional parameters are - - Provide links to the tool documentation so future readers of the code know where to find that information - - Explain what non-intuitive bash/python text wrangling actions do, e.g. - - ```bash - ## awk for gene column ($6) to grab subtype ($15) - cat ~{file} | awk -F '\t' '{if ($6=="M1") print $15}' > FLU_TYPE - ``` - -??? toggle "`output` block" - - File types should be clearly stated in the output name variables + ```bash + input { + Int cpu = 4 # Number of CPUs + Int disk_size = 100 # Disk space in GB + String docker = "us-docker.pkg.dev/example:1.0.0" # Docker container for the task + Int memory = 16 # Memory in GB + } + ``` + +- Include optional tool parameters as inputs to the task + + ```bash + input { + Int? optional_tool_parameter1 + String optional_tool_parameter2_with_default = "default_value" + } + ``` + +- Input and output lists should **not** be formatted to have the equal sign aligned, but instead **use a single space** before and after the `=` + + ```bash + correct_output = "output_file" + long_variable_name = "long_file_name" + ``` + +- Expose Docker as an input, an output (if versioning information not available), and runtime variable: + + ```bash + input { + String docker = "us-docker.pkg.dev/example:1.0.0" + } + ... + output { + String used_docker = docker + } + runtime { + docker: docker + } + ``` + +### The `command` block + +- Ensure use of line breaks between different sections of code to improve readability + + ```bash + # Perform task step 1 + if [ condition ]; then + action1(variable) + fi + + # Perform task step 2 + if [ another_condition ]; then + action2(variable) + fi + ``` + +- Split command calls into multiple lines if they have user input variables and/or if the length of the command is very long to avoid text wrapping and/or side-scrolling, e.g. + - Use backslashes for continuation and indentation to clarify structure: + + ```bash + tool \ + --input ~{input_file} \ + --output ~{output_file} \ + --option1 ~{option1} \ + ... + --optionN ~{optionN} + ``` + +- Add comments that + - Explain what the optional parameters are + - Provide links to the tool documentation so future readers of the code know where to find that information + - Explain what non-intuitive bash/python text wrangling actions do, e.g. ```bash - output1_csv = file1.csv - output2_tsv = file2.tsv - ``` - - - Ensure the docker container is exposed as an output string, e.g. - - ```bash - input { - String docker - } - ... - output { - String XX_docker = docker - } - runtime { - docker: docker - } + ## awk for gene column ($6) to grab subtype ($15) + cat ~{file} | awk -F '\t' '{if ($6=="M1") print $15}' > FLU_TYPE ``` -??? toggle "`runtime` block" - - Always use a docker container +### The `output` block + +- The output block specifies the files or variables produced by the task. Follow these conventions: + + ```bash + output { + File result_csv = "output.csv" # CSV file generated + File result_log = "log.txt" # Log file + } + ``` + +- Ensure the docker container is exposed as an output string, e.g. + + ```bash + input { + String docker = "us-docker.pkg.dev/general-theiagen/tool:version" + } + ... + output { + String XX_docker = docker + } + runtime { + docker: docker + } + ``` + +### The `runtime` block + +- The runtime block defines the compute resources and environment for the task. +- Always specify a Docker: + + ```bash + runtime { + docker: docker + cpu: cpu + memory: memory + disk: disk_size + } + ``` ## Workflow Blocks -The workflow/sub-workflow file should contain: +A WDL workflow block orchestrates the execution of tasks and subworkflows. It defines the inputs, calls tasks or subworkflows, and specifies the final outputs. To ensure readability and consistency, follow these conventions when writing workflow blocks: + +### The `import` section + +- Include a block of `import` statements (sorted in alphabetical order). + - When a workflow imports a task, ensure it is imported under a unique name to avoid conflicts. -- a block of `import` statements (alphabetical order), - - When a workflow imports a task, make sure that it is imported under a different name than the task it is calling -- a `workflow` block with - - an `input` section - - `call` sections for specified tasks - - an `output` section + ```bash + import "../tasks/task_task1.wdl" as task1_task + import "../tasks/task_task2.wdl" as task2_task + ``` -Example formatting is shown below. +- Order import statements alphabetically by the path of the imported file. + +### The `input` block + +- Optional inputs that should be able to be edited by the user, such as docker containers should be exposed on the workflow level as in the example +- In the case of subworkflows, all optional inputs should be exposed on the workflow level so that they can be modified by users on Terra + +```bash +input { + String input + String task1_docker = "us-docker.pkg.dev/general-theiagen/tool:version" + String? task1_optional_argument +} +``` + +### The `call` sections + +- Import task files as something other than the included task nam in order to avoid namespace conflicts + +```bash +call task1_task.task1 { + input: + input = input, + docker = task1_docker +} +``` + +### The `output` block + +- Define all workflow outputs in this section. +- Use descriptive names for each output variable. +- Order outputs alphabetically by the name of the output variable + +```bash +output { + # Task 1 outputs + File task1_out_csv = task1.output_csv + String task1_version = task1.version + + # Subworkflow outputs + File subworkflow_out_tsv = subworkflow.task3_out_tsv + String subworkflow_version = subworkflow.task3_version +} +``` + +## Example Workflow formats ??? toggle "wf_example_wf.wdl" @@ -190,7 +291,6 @@ Example formatting is shown below. String task2_docker = "us-docker.pkg.dev/general-theiagen//task_2:version" String? hidden_task3_argument String? hidden_task3_docker - String? hidden_task4_argument String? hidden_task4_docker } call task1_task.task1 { @@ -205,7 +305,10 @@ Example formatting is shown below. } call subworkflow.subworkflow { input: - input = input + input = input, + task3_argument = hidden_task3_argument, + task3_docker = hidden_task3_docker + task4_docker = hidden_task4_docker } output { # Task 1 outputs @@ -216,16 +319,19 @@ Example formatting is shown below. File task2_out_tsv = task2.output_tsv String task2_version = task2.version String task2_docker = task2.docker - # Subworkflow outputs + # Subworkflow outputs for task 3 File task3_out_tsv = subworkflow.task3_out_tsv String task3_version = subworkflow.task3_version String task3_docker = subworkflow.task3_docker + # Subworkflow outputs for task 4 + String task4_output = subworkflow.task4_output + String task4_version = subworkflow.task4_version } } ``` - ??? toggle "wf_subworkflow.wdl" + ```bash import "../tasks/task_task3.wdl" as task3_task import "../tasks/task_task4.wdl" as task4_task @@ -239,6 +345,7 @@ Example formatting is shown below. # level so they can be modified by a Terra user String? task3_argument String? task3_docker + String? task4_docker } call task3_task.task3 { input: @@ -246,38 +353,17 @@ Example formatting is shown below. args = task3_argument, docker = task3_docker } + call task4_task.task4 { + input: + input = task3.output_tsv, + docker = task4_docker + } output { File task3_out_tsv = task3.output_tsv String task3_version = task3.version String task3_docker = task3.docker + String task4_output = task4.output + String task4_version = task4.version } } ``` - ---- - -??? toggle "`input` section" - - Optional inputs that should be able to be edited by the user, such as docker containers should be exposed on the workflow level as in the example - - In the case of subworkflows, all optional inputs should be exposed on the workflow level so that they can be modified by users on Terra - -??? toggle "`call` task sections" - - There should be no blank lines between tasks in workflows - - ```bash - task A { - } - task B { - } - ``` - - - Label a group of outputs by the source/species for organizational purposes when a workflow has many different outputs - - ```ebnf - output { - ... - # task99 outputs - String task99_ouput - String task99_file - ... - } - ```