From c739018d0fd7e1e65cd8c2f2377fea3f4fb95082 Mon Sep 17 00:00:00 2001 From: Michael Franklin Date: Mon, 16 Mar 2020 23:32:56 +1100 Subject: [PATCH] Update tutorial 2 using data from tut1 + improved instructions from W2 --- docs/tutorials/tutorial2.md | 201 +++++++++++++++++------------------- 1 file changed, 94 insertions(+), 107 deletions(-) diff --git a/docs/tutorials/tutorial2.md b/docs/tutorials/tutorial2.md index ab4e63d9..cc4648c5 100644 --- a/docs/tutorials/tutorial2.md +++ b/docs/tutorials/tutorial2.md @@ -1,62 +1,27 @@ # Tutorial 2 - Wrapping a new tool +> This tutorial builds on the content and output from [Tutorial 1](https://janis.readthedocs.io/en/latest/tutorials/tutorial1.html). + ## Introduction A CommandTool is the interface between Janis and a program to be executed. Simply put, a CommandTool has a name, a command, inputs, outputs and a container to run in. Inputs and arguments can have a prefix and / or position, and this is used to construct the command line. The Janis documentation for the [CommandTool](https://janis.readthedocs.io/en/latest/references/commandtool.html) gives an introduction to the tool structure and a template for constructing your own tool. A tool wrapper must provide all of the information to configure and run the specified tool, this includes the `base_command`, [janis.ToolInput](https://janis.readthedocs.io/en/latest/references/commandtool.html#tool-input), [janis.ToolOutput](https://janis.readthedocs.io/en/latest/references/commandtool.html#tool-output), a `container` and its version. -## Requirements - -You must have Python 3.6 and Janis installed: - -```bash -pip3 install janis-pipelines -``` - -You can check you have the correct version of Janis installed by running: - -```bash -$ janis -v --------------------- ------ -janis-core v0.7.1 -janis-assistant v0.7.8 -janis-unix v0.7.0 -janis-bioinformatics v0.7.1 --------------------- ------ -``` - -### Setup - -This tutorial is on checked in on GitHub with sample data. You can download this sample data and template files with the following: - -```bash -git clone https://github.com/PMCC-BioinformaticsCore/janis-workshops.git -cd janis-workshops/workshop3 -ls -lGh * # ls with extra options -``` - -You'll see a list of files within this repository: - -- `README.md` - *This file* -- `samtoolsflagstat.py` - The template for this tutorial -- `samtoolsflagstat-final.py` - The final command tool (also at the bottom of this file) -- `data/brca1.bam` - A Bam file that this tool can be run with -- `data/README.md` - Information about the data file - - ### Container > _Further information_: [Containerising your tools](https://janis.readthedocs.io/en/latest/tutorials/container.html) -> Guide on using containers +For portability, Janis requires that you specify an OCI compliant `container` (eg: Docker) for your tool. Often there will already be a container with some searching, however here's a guide on [preparing your tools in containers](https://janis.readthedocs.io/en/latest/tutorials/container.html) to ensure it works across all environments. -For portability, we require that you specify an OCI compliant `container` (eg: Docker) for your tool. Often there will already be a container with some searching, however here's a guide on [preparing your tools in containers](https://janis.readthedocs.io/en/latest/tutorials/container.html) to ensure it works across all environments. +## Preparation +The sample data to test this tool is computed in [Tutorial 1](https://janis.readthedocs.io/en/latest/tutorials/tutorial1.html). You can follow this tutorial, but running the example will require you to have completed and obtained the bam from the first tutorial. ## Samtools flagstat -In this workshop we're going to wrap the `samtools flagstat` tool. +In this tutorial we're going to wrap the `samtools flagstat` tool - flagstat counts the number of alignments for each FLAG type within a bam file. + ### Samtools project links @@ -84,21 +49,20 @@ Hence, we can isolate the following information: ### Command tool template -The following template is the minimum amount of information required to wrap a tool. For more information, see the [CommandToolBuilder documentation](https://janis.readthedocs.io/en/latest/references/commandtool.html). +The following template is the minimum amount of information required to wrap a tool. For more information, see the [CommandToolBuilder documentation](https://janis.readthedocs.io/en/latest/references/commandtool.html#janis.CommandToolBuilder). -> We've removed the optional fields: tool_module, tool_provider, metadata, cpu, memory from the following template. +> We've removed the optional fields: `tool_module`, `tool_provider`, `metadata`, `cpu`, `memory` from the following [template](https://janis.readthedocs.io/en/latest/references/commandtool.html#template). -```python -from typing import List, Optional, Union -import janis as j +We're going to use `Bam` and `TextFile` data types, so let's import them as well. -import janis_core as j +```python +from janis_core import CommandToolBuilder, ToolInput, ToolOutput, Int, Stdout -ToolName = j.CommandToolBuilder( +ToolName = CommandToolBuilder( tool: str="toolname", base_command=["base", "command"], - inputs: List[j.ToolInput]=[], - outputs: List[j.ToolOutput]=[], + inputs=[], # List[ToolInput] + outputs=[], # List[ToolOutput] container="container/name:version", version="version" ) @@ -106,10 +70,11 @@ ToolName = j.CommandToolBuilder( ### Tool information -Let's start by creating a file with this template: +Let's start by creating a file with this template inside a second output directory: ```bash -vim samtoolsflagstat.py +mkdir -p tools +vim tools/samtoolsflagstat.py ``` We can start by filling in the basic information: @@ -122,53 +87,70 @@ We can start by filling in the basic information: - `version` to be `"v1.9.0"` You'll have a class definition like the following + ```python -SamtoolsFlagstat = j.CommandToolBuilder( +SamtoolsFlagstat = CommandToolBuilder( tool: str="samtoolsflagstat", base_command=["samtools", "flagstat"], container="quay.io/biocontainers/samtools:1.9--h8571acd_11", version="1.9.0", - inputs: List[j.ToolInput]=[], - outputs: List[j.ToolOutput]=[], + inputs=[], # List[ToolInput] + outputs=[], # List[ToolOutput] ) ``` ### Inputs +> Further reading: [`ToolInput`](https://janis.readthedocs.io/en/latest/references/commandtool.html#tool-input) -We'll use the [ToolInput](https://janis.readthedocs.io/en/latest/references/commandtool.html#tool-input) class to represent these inputs. A `ToolInput` provides a mechanism for binding this input onto the command line (eg: prefix, position, transformations). See the documentation for more ways to configure a ToolInput. - -Our positional input is a Bam, so we'll import the Bam type from `janis` with the following line: +We'll use the [ToolInput](https://janis.readthedocs.io/en/latest/references/commandtool.html#tool-input) class to represent these inputs. A `ToolInput` provides a mechanism for binding this input onto the command line (eg: prefix, position, transformations). See the documentation for more ways to [configure a ToolInput](https://janis.readthedocs.io/en/latest/references/commandtool.html#tool-input). ```python -from janis.data_types import Bam +janis.ToolInput( + tag: str, + input_type: DataType, + position: Optional[int] = None, + prefix: Optional[str] = None, + # more configuration options + separate_value_from_prefix: bool = None, + prefix_applies_to_all_elements: bool = None, + presents_as: str = None, + secondaries_present_as: Dict[str, str] = None, + separator: str = None, + shell_quote: bool = None, + localise_file: bool = None, + default: Any = None, + doc: Optional[str] = None +) ``` -Then we can declare our two inputs: +> Nb: A ToolInput must have a `position` OR `prefix` in order to be bound onto the command line. If the prefix is specified with no position, a `position=0` is automatically applied. -1. Positional bam input +Now we can declare our two inputs: + +1. Positional bam input 2. Threads configuration input with the prefix `--threads` We're going to give our inputs a name through which we can reference them by. This allows us to specify a value from the command line, or connect the result of a previous step [within a workflow](https://janis.readthedocs.io/en/latest/tutorials/tutorial1.html#bwa-mem). ```python -SamtoolsFlagstat = j.CommandToolBuilder( - # tool information +SamtoolsFlagstat = CommandToolBuilder( + # ... tool information inputs=[ # 1. Positional bam input - j.ToolInput( + ToolInput( "bam", # name of our input Bam, position=1, doc="Input bam to generate statistics for" ), # 2. `threads` inputs - j.ToolInput( + ToolInput( "threads", # name of our input - j.Int(optional=True), + Int(optional=True), prefix="--threads", - doc="(-@) Number of additional threads to use [0] " + doc="(-@) Number of additional threads to use [0]" ) ], # outputs @@ -177,41 +159,55 @@ SamtoolsFlagstat = j.CommandToolBuilder( ### Outputs +> Further reading: [`ToolOutput`](https://janis.readthedocs.io/en/latest/references/commandtool.html#tool-output) + We'll use the [ToolOutput](https://janis.readthedocs.io/en/latest/references/commandtool.html#tool-output) class to collect and represent these outputs. A `ToolOutput` has a type, and if not using `stdout` we can provide a `glob` parameter. -The only output of `samtools flagstat` is the statistics that are written to `stdout`. We give this the name `"stats"`, and collect this with the `j.Stdout` data type: +```python +janis.ToolOutput( + tag: str, + output_type: DataType, + glob: Union[janis_core.types.selectors.Selector, str, None] = None, + # more configuration options + presents_as: str = None, + secondaries_present_as: Dict[str, str] = None, + doc: Optional[str] = None +) +``` + +The only output of `samtools flagstat` is the statistics that are written to `stdout`. We give this the name `"stats"`, and collect this with the `Stdout` data type. We can additionally tell Janis that the Stdout has type [`TextFile`](https://janis.readthedocs.io/en/latest/datatypes/textfile.html). ```python -SamtoolsFlagstat = j.CommandToolBuilder( - # tool information + inputs +SamtoolsFlagstat = CommandToolBuilder( + # ... tool information + inputs outputs=[ - j.ToolOutput("stats", j.Stdout) + ToolOutput("stats", Stdout(TextFile)) ] ) ``` - ### Tool definition Putting this all together, you should have the following tool definition: ```python -from typing import List, Optional, Union -import janis as j -from janis.data_types import Bam +from janis_core import CommandToolBuilder, ToolInput, ToolOutput, Int, Stdout + +from janis_unix.data_types import TextFile +from janis_bioinformatics.data_types import Bam -SamToolsFlagstat_1_9 = j.CommandToolBuilder( +SamtoolsFlagstat = CommandToolBuilder( tool="samtoolsflagstat", base_command=["samtools", "flagstat"], container="quay.io/biocontainers/samtools:1.9--h8571acd_11", version="v1.9.0", inputs=[ # 1. Positional bam input - j.ToolInput("bam", Bam, position=1), + ToolInput("bam", Bam, position=1), # 2. `threads` inputs - j.ToolInput("threads", j.Int(optional=True), prefix="--threads"), + ToolInput("threads", Int(optional=True), prefix="--threads"), ], - outputs=[j.ToolOutput("stats", j.Stdout)], + outputs=[ToolOutput("stats", Stdout(TextFile))], ) ``` @@ -222,11 +218,12 @@ We can test the translation of this from the CLI: > If you have multiple command tools or workflows declared in the same file, you will need to provide the `--name` parameter with the name of your tool. ```bash -janis translate samtoolsflagstat.py wdl # or cwl +janis translate tools/samtoolsflagstat.py wdl # or cwl ``` In the following translation, we can see the WDL representation of our tool. In particular, the `command` block gives us an indication of how the command line might look: -``` + +```wdl task samtoolsflagstat { input { Int? runtime_cpu @@ -234,19 +231,19 @@ task samtoolsflagstat { File bam Int? threads } - command { + command <<< samtools flagstat \ - ${"--threads " + threads} \ - ${bam} - } + ~{"--threads " + threads} \ + ~{bam} + >>> runtime { docker: "quay.io/biocontainers/samtools:1.9--h8571acd_11" cpu: if defined(runtime_cpu) then runtime_cpu else 1 - memory: if defined(runtime_memory) then "${runtime_memory}G" else "4G" + memory: if defined(runtime_memory) then "~{runtime_memory}G" else "4G" preemptible: 2 } output { - File out = stdout() + File stats = stdout() } } ``` @@ -255,10 +252,12 @@ task samtoolsflagstat { ### Running the workflow -We can call the `janis run` functionality (default CWLTool), and provide the data file to the input called `bam` with the following line: +> A reminder that the sample data for this section requires you to have completed Tutorial 1. + +We can call the `janis run` functionality, and use the output from tutorial1: ```bash -janis run samtoolsflagstat.py --bam data/brca1.bam +janis run -o tutorial2 tools/samtoolsflagstat.py --bam tutorial1/out.bam ``` OUTPUT: @@ -268,8 +267,8 @@ EngId: f9e89f Name: samtoolsflagstatWf Engine: cwltool -Task Dir: $HOME/janis/execution/samtoolsflagstatWf/20191114_155159_f9e89f/ -Exec Dir: None +Task Dir: $HOME/janis-tutorials/tutorial2/ +Exec Dir: $HOME/janis-tutorials/tutorial2/janis/execution/ Status: Completed Duration: 4s @@ -281,15 +280,13 @@ Jobs: [✓] samtoolsflagstat (N/A) Outputs: - - out: $HOME/janis/execution/samtoolsflagstatWf/20191114_155159_f9e89f/output/out -2019-11-14T15:52:05 [INFO]: Exiting - + - stats: $HOME/janis-tutorials/tutorial2/stats.txt ``` Janis (and CWLTool) said the tool executed correctly, let's check the output file: ```bash -cat $HOME/janis/execution/samtoolsflagstatWf/20191114_155159_f9e89f/output/out +cat tutorial2/stats.txt ``` ``` @@ -306,14 +303,4 @@ cat $HOME/janis/execution/samtoolsflagstatWf/20191114_155159_f9e89f/output/out 90 + 0 singletons (0.46% : N/A) 860 + 0 with mate mapped to a different chr 691 + 0 with mate mapped to a different chr (mapQ>=5) -``` - -## Summary - -- Learn about the structure of a CommandTool, -- Use an existing docker container, -- Wrapped the inputs, outputs and tool information in a Janis CommandTool wrapper, - -### Next steps - -- [Containerising a tool](https://janis.readthedocs.io/en/latest/tutorials/container.html) +``` \ No newline at end of file