Skip to content

Commit

Permalink
Update tutorial 2 using data from tut1 + improved instructions from W2
Browse files Browse the repository at this point in the history
  • Loading branch information
illusional committed Mar 16, 2020
1 parent 0ff8c86 commit c739018
Showing 1 changed file with 94 additions and 107 deletions.
201 changes: 94 additions & 107 deletions docs/tutorials/tutorial2.md
Original file line number Diff line number Diff line change
@@ -1,62 +1,27 @@
# Tutorial 2 - Wrapping a new tool

> This tutorial builds on the content and output from [Tutorial 1](https://janis.readthedocs.io/en/latest/tutorials/tutorial1.html).
## Introduction

A CommandTool is the interface between Janis and a program to be executed. Simply put, a CommandTool has a name, a command, inputs, outputs and a container to run in. Inputs and arguments can have a prefix and / or position, and this is used to construct the command line.

The Janis documentation for the [CommandTool](https://janis.readthedocs.io/en/latest/references/commandtool.html) gives an introduction to the tool structure and a template for constructing your own tool. A tool wrapper must provide all of the information to configure and run the specified tool, this includes the `base_command`, [janis.ToolInput](https://janis.readthedocs.io/en/latest/references/commandtool.html#tool-input), [janis.ToolOutput](https://janis.readthedocs.io/en/latest/references/commandtool.html#tool-output), a `container` and its version.

## Requirements

You must have Python 3.6 and Janis installed:

```bash
pip3 install janis-pipelines
```

You can check you have the correct version of Janis installed by running:

```bash
$ janis -v
-------------------- ------
janis-core v0.7.1
janis-assistant v0.7.8
janis-unix v0.7.0
janis-bioinformatics v0.7.1
-------------------- ------
```

### Setup

This tutorial is on checked in on GitHub with sample data. You can download this sample data and template files with the following:

```bash
git clone https://github.com/PMCC-BioinformaticsCore/janis-workshops.git
cd janis-workshops/workshop3
ls -lGh * # ls with extra options
```

You'll see a list of files within this repository:

- `README.md` - *This file*
- `samtoolsflagstat.py` - The template for this tutorial
- `samtoolsflagstat-final.py` - The final command tool (also at the bottom of this file)
- `data/brca1.bam` - A Bam file that this tool can be run with
- `data/README.md` - Information about the data file


### Container

> _Further information_: [Containerising your tools](https://janis.readthedocs.io/en/latest/tutorials/container.html)
> Guide on using containers
For portability, Janis requires that you specify an OCI compliant `container` (eg: Docker) for your tool. Often there will already be a container with some searching, however here's a guide on [preparing your tools in containers](https://janis.readthedocs.io/en/latest/tutorials/container.html) to ensure it works across all environments.

For portability, we require that you specify an OCI compliant `container` (eg: Docker) for your tool. Often there will already be a container with some searching, however here's a guide on [preparing your tools in containers](https://janis.readthedocs.io/en/latest/tutorials/container.html) to ensure it works across all environments.
## Preparation

The sample data to test this tool is computed in [Tutorial 1](https://janis.readthedocs.io/en/latest/tutorials/tutorial1.html). You can follow this tutorial, but running the example will require you to have completed and obtained the bam from the first tutorial.

## Samtools flagstat

In this workshop we're going to wrap the `samtools flagstat` tool.
In this tutorial we're going to wrap the `samtools flagstat` tool - flagstat counts the number of alignments for each FLAG type within a bam file.


### Samtools project links

Expand Down Expand Up @@ -84,32 +49,32 @@ Hence, we can isolate the following information:

### Command tool template

The following template is the minimum amount of information required to wrap a tool. For more information, see the [CommandToolBuilder documentation](https://janis.readthedocs.io/en/latest/references/commandtool.html).
The following template is the minimum amount of information required to wrap a tool. For more information, see the [CommandToolBuilder documentation](https://janis.readthedocs.io/en/latest/references/commandtool.html#janis.CommandToolBuilder).

> We've removed the optional fields: tool_module, tool_provider, metadata, cpu, memory from the following template.
> We've removed the optional fields: `tool_module`, `tool_provider`, `metadata`, `cpu`, `memory` from the following [template](https://janis.readthedocs.io/en/latest/references/commandtool.html#template).
```python
from typing import List, Optional, Union
import janis as j
We're going to use `Bam` and `TextFile` data types, so let's import them as well.

import janis_core as j
```python
from janis_core import CommandToolBuilder, ToolInput, ToolOutput, Int, Stdout

ToolName = j.CommandToolBuilder(
ToolName = CommandToolBuilder(
tool: str="toolname",
base_command=["base", "command"],
inputs: List[j.ToolInput]=[],
outputs: List[j.ToolOutput]=[],
inputs=[], # List[ToolInput]
outputs=[], # List[ToolOutput]
container="container/name:version",
version="version"
)
```

### Tool information

Let's start by creating a file with this template:
Let's start by creating a file with this template inside a second output directory:

```bash
vim samtoolsflagstat.py
mkdir -p tools
vim tools/samtoolsflagstat.py
```

We can start by filling in the basic information:
Expand All @@ -122,53 +87,70 @@ We can start by filling in the basic information:
- `version` to be `"v1.9.0"`

You'll have a class definition like the following

```python
SamtoolsFlagstat = j.CommandToolBuilder(
SamtoolsFlagstat = CommandToolBuilder(
tool: str="samtoolsflagstat",
base_command=["samtools", "flagstat"],
container="quay.io/biocontainers/samtools:1.9--h8571acd_11",
version="1.9.0",

inputs: List[j.ToolInput]=[],
outputs: List[j.ToolOutput]=[],
inputs=[], # List[ToolInput]
outputs=[], # List[ToolOutput]
)
```

### Inputs

> Further reading: [`ToolInput`](https://janis.readthedocs.io/en/latest/references/commandtool.html#tool-input)
We'll use the [ToolInput](https://janis.readthedocs.io/en/latest/references/commandtool.html#tool-input) class to represent these inputs. A `ToolInput` provides a mechanism for binding this input onto the command line (eg: prefix, position, transformations). See the documentation for more ways to configure a ToolInput.

Our positional input is a Bam, so we'll import the Bam type from `janis` with the following line:
We'll use the [ToolInput](https://janis.readthedocs.io/en/latest/references/commandtool.html#tool-input) class to represent these inputs. A `ToolInput` provides a mechanism for binding this input onto the command line (eg: prefix, position, transformations). See the documentation for more ways to [configure a ToolInput](https://janis.readthedocs.io/en/latest/references/commandtool.html#tool-input).

```python
from janis.data_types import Bam
janis.ToolInput(
tag: str,
input_type: DataType,
position: Optional[int] = None,
prefix: Optional[str] = None,
# more configuration options
separate_value_from_prefix: bool = None,
prefix_applies_to_all_elements: bool = None,
presents_as: str = None,
secondaries_present_as: Dict[str, str] = None,
separator: str = None,
shell_quote: bool = None,
localise_file: bool = None,
default: Any = None,
doc: Optional[str] = None
)
```

Then we can declare our two inputs:
> Nb: A ToolInput must have a `position` OR `prefix` in order to be bound onto the command line. If the prefix is specified with no position, a `position=0` is automatically applied.
1. Positional bam input
Now we can declare our two inputs:

1. Positional bam input
2. Threads configuration input with the prefix `--threads`

We're going to give our inputs a name through which we can reference them by. This allows us to specify a value from the command line, or connect the result of a previous step [within a workflow](https://janis.readthedocs.io/en/latest/tutorials/tutorial1.html#bwa-mem).

```python
SamtoolsFlagstat = j.CommandToolBuilder(
# tool information
SamtoolsFlagstat = CommandToolBuilder(
# ... tool information
inputs=[
# 1. Positional bam input
j.ToolInput(
ToolInput(
"bam", # name of our input
Bam,
position=1,
doc="Input bam to generate statistics for"
),
# 2. `threads` inputs
j.ToolInput(
ToolInput(
"threads", # name of our input
j.Int(optional=True),
Int(optional=True),
prefix="--threads",
doc="(-@) Number of additional threads to use [0] "
doc="(-@) Number of additional threads to use [0]"
)
],
# outputs
Expand All @@ -177,41 +159,55 @@ SamtoolsFlagstat = j.CommandToolBuilder(

### Outputs

> Further reading: [`ToolOutput`](https://janis.readthedocs.io/en/latest/references/commandtool.html#tool-output)

We'll use the [ToolOutput](https://janis.readthedocs.io/en/latest/references/commandtool.html#tool-output) class to collect and represent these outputs. A `ToolOutput` has a type, and if not using `stdout` we can provide a `glob` parameter.

The only output of `samtools flagstat` is the statistics that are written to `stdout`. We give this the name `"stats"`, and collect this with the `j.Stdout` data type:
```python
janis.ToolOutput(
tag: str,
output_type: DataType,
glob: Union[janis_core.types.selectors.Selector, str, None] = None,
# more configuration options
presents_as: str = None,
secondaries_present_as: Dict[str, str] = None,
doc: Optional[str] = None
)
```

The only output of `samtools flagstat` is the statistics that are written to `stdout`. We give this the name `"stats"`, and collect this with the `Stdout` data type. We can additionally tell Janis that the Stdout has type [`TextFile`](https://janis.readthedocs.io/en/latest/datatypes/textfile.html).

```python
SamtoolsFlagstat = j.CommandToolBuilder(
# tool information + inputs
SamtoolsFlagstat = CommandToolBuilder(
# ... tool information + inputs
outputs=[
j.ToolOutput("stats", j.Stdout)
ToolOutput("stats", Stdout(TextFile))
]
)
```


### Tool definition

Putting this all together, you should have the following tool definition:

```python
from typing import List, Optional, Union
import janis as j
from janis.data_types import Bam
from janis_core import CommandToolBuilder, ToolInput, ToolOutput, Int, Stdout

from janis_unix.data_types import TextFile
from janis_bioinformatics.data_types import Bam

SamToolsFlagstat_1_9 = j.CommandToolBuilder(
SamtoolsFlagstat = CommandToolBuilder(
tool="samtoolsflagstat",
base_command=["samtools", "flagstat"],
container="quay.io/biocontainers/samtools:1.9--h8571acd_11",
version="v1.9.0",
inputs=[
# 1. Positional bam input
j.ToolInput("bam", Bam, position=1),
ToolInput("bam", Bam, position=1),
# 2. `threads` inputs
j.ToolInput("threads", j.Int(optional=True), prefix="--threads"),
ToolInput("threads", Int(optional=True), prefix="--threads"),
],
outputs=[j.ToolOutput("stats", j.Stdout)],
outputs=[ToolOutput("stats", Stdout(TextFile))],
)
```

Expand All @@ -222,31 +218,32 @@ We can test the translation of this from the CLI:
> If you have multiple command tools or workflows declared in the same file, you will need to provide the `--name` parameter with the name of your tool.

```bash
janis translate samtoolsflagstat.py wdl # or cwl
janis translate tools/samtoolsflagstat.py wdl # or cwl
```

In the following translation, we can see the WDL representation of our tool. In particular, the `command` block gives us an indication of how the command line might look:
```

```wdl
task samtoolsflagstat {
input {
Int? runtime_cpu
Int? runtime_memory
File bam
Int? threads
}
command {
command <<<
samtools flagstat \
${"--threads " + threads} \
${bam}
}
~{"--threads " + threads} \
~{bam}
>>>
runtime {
docker: "quay.io/biocontainers/samtools:1.9--h8571acd_11"
cpu: if defined(runtime_cpu) then runtime_cpu else 1
memory: if defined(runtime_memory) then "${runtime_memory}G" else "4G"
memory: if defined(runtime_memory) then "~{runtime_memory}G" else "4G"
preemptible: 2
}
output {
File out = stdout()
File stats = stdout()
}
}
```
Expand All @@ -255,10 +252,12 @@ task samtoolsflagstat {

### Running the workflow

We can call the `janis run` functionality (default CWLTool), and provide the data file to the input called `bam` with the following line:
> A reminder that the sample data for this section requires you to have completed Tutorial 1.

We can call the `janis run` functionality, and use the output from tutorial1:

```bash
janis run samtoolsflagstat.py --bam data/brca1.bam
janis run -o tutorial2 tools/samtoolsflagstat.py --bam tutorial1/out.bam
```

OUTPUT:
Expand All @@ -268,8 +267,8 @@ EngId: f9e89f
Name: samtoolsflagstatWf
Engine: cwltool

Task Dir: $HOME/janis/execution/samtoolsflagstatWf/20191114_155159_f9e89f/
Exec Dir: None
Task Dir: $HOME/janis-tutorials/tutorial2/
Exec Dir: $HOME/janis-tutorials/tutorial2/janis/execution/

Status: Completed
Duration: 4s
Expand All @@ -281,15 +280,13 @@ Jobs:
[✓] samtoolsflagstat (N/A)

Outputs:
- out: $HOME/janis/execution/samtoolsflagstatWf/20191114_155159_f9e89f/output/out
2019-11-14T15:52:05 [INFO]: Exiting

- stats: $HOME/janis-tutorials/tutorial2/stats.txt
```

Janis (and CWLTool) said the tool executed correctly, let's check the output file:

```bash
cat $HOME/janis/execution/samtoolsflagstatWf/20191114_155159_f9e89f/output/out
cat tutorial2/stats.txt
```

```
Expand All @@ -306,14 +303,4 @@ cat $HOME/janis/execution/samtoolsflagstatWf/20191114_155159_f9e89f/output/out
90 + 0 singletons (0.46% : N/A)
860 + 0 with mate mapped to a different chr
691 + 0 with mate mapped to a different chr (mapQ>=5)
```

## Summary

- Learn about the structure of a CommandTool,
- Use an existing docker container,
- Wrapped the inputs, outputs and tool information in a Janis CommandTool wrapper,

### Next steps

- [Containerising a tool](https://janis.readthedocs.io/en/latest/tutorials/container.html)
```

0 comments on commit c739018

Please sign in to comment.