Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback after reading tutorials #51

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 14 additions & 7 deletions docs/tutorials/tutorial0.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Janis was designed with a few points in mind:

- Workflows should be easy to build,
- Workflows and tools must be easily shared (portable),
- Workflows should be able to execute on HPCs and cloud environments.
- Workflows should be able to execute on HPCs and cloud environments,
- Workflows should be reproducible and re-runnable.

Janis uses an *abstracted execution environment*, which removes the shared file system in favour of you specifiying all the files you need up front and passing them around as a File object. This allows the same workflow to be executable on your local machine, HPCs and cloud, and we let the `execution engine` handle moving our files. This also means that we can use file systems like ``S3``, ``GCS``, ``FTP`` and more without any changes to our workflow.
Expand Down Expand Up @@ -63,7 +63,7 @@ We'll install Janis in a virtual environment as it preserves versioning of Janis
pip install cwltool
```

Test that CWLTool has installed correctly with:
Test that CWLTool has been installed correctly with:

```bash
cwltool --version
Expand All @@ -80,6 +80,13 @@ mkdir ~/janis
cd ~/janis
```

Let's also create a directory to store the files for the tutorials.

```bash
mkdir janis-tutorials
cd janis-tutorials
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating janis-tutorials in tutorial 0. Assuming users will follow the tutorials in order (added a note to each about it, if not already there).

```

You can test run an example workflow with Janis and CWLTool with the following command:

```bash
Expand Down Expand Up @@ -112,7 +119,7 @@ janis watch d909df
# Name: hello
# Engine: cwltool
#
# Task Dir: $HOME/janis/tutorial0
# Task Dir: $HOME/janis/janis-tutorials/tutorial0
# Exec Dir: None
#
# Status: Completed
Expand All @@ -125,13 +132,13 @@ janis watch d909df
# [✓] hello (1s)
#
# Outputs:
# - out: $HOME/janis/tutorial0/out
# - out: $HOME/janis/janis-tutorials/tutorial0/out
```

There is a single output `out` from the workflow, cat-ing this result we get:

```bash
cat $HOME/janis/tutorial0/out
cat $HOME/janis/janis-tutorials/tutorial0/out
# Hello, World
```

Expand All @@ -152,7 +159,7 @@ janis run --engine cwltool -o tutorial0-override hello --inp "Hello, $(whoami)"

### Running Janis in the background

You may want to run Janis in the background as it's own process. You could do this with `nohup [command] &`, however we can also run Janis with the `--background` flag and capture the workflow ID to watch, eg:
You may want to run Janis in the background as its own process. You could do this with `nohup [command] &`, however we can also run Janis with the `--background` flag and capture the workflow ID to watch, eg:

```bash
wid=$(janis run \
Expand All @@ -165,7 +172,7 @@ janis watch $wid

## Summary

- Setup a virtualenv
- Set up a virtualenv
- Installed Janis and CWLTool
- Ran a small workflow with custom inputs

Expand Down
27 changes: 13 additions & 14 deletions docs/tutorials/tutorial1.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Tutorial 1 - Building a Workflow

> This tutorial uses directories created in [Tutorial 0](https://janis.readthedocs.io/en/latest/tutorials/tutorial0.html).

In this stage, we're going to build a simple workflow to align short reads of DNA.

1. Start with a pair of compressed `FASTQ` files,
Expand All @@ -15,16 +17,16 @@ These tools already exist within the Janis Tool Registry, you can see their docu

## Preparation

To prepare for this tutorial, we're going to create a folder and download some data:
To prepare for this tutorial, we're going to need to download some data first:

```bash
mkdir janis-tutorials && cd janis-tutorials
cd ~/janis/janis-tutorials

# If WGET is installed
wget -q -O- "https://github.com/PMCC-BioinformaticsCore/janis-workshops/raw/master/janis-data.tar" | tar -xz
wget -q -O- "https://github.com/PMCC-BioinformaticsCore/janis-workshops/raw/master/janis-data.tar" | tar -x
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the z needs to be there only if the file is gzipped. Removing it the command succeeded, although with warnings (will create an issue about the warnings).


# If CURL is installed
curl -Ls "https://github.com/PMCC-BioinformaticsCore/janis-workshops/raw/master/janis-data.tar" | tar -xz
curl -Ls "https://github.com/PMCC-BioinformaticsCore/janis-workshops/raw/master/janis-data.tar" | tar -x
```


Expand Down Expand Up @@ -60,7 +62,7 @@ from janis_bioinformatics.data_types import FastqGzPairedEnd, FastaWithDict

### Tools

We've discussed the tools we're going to use. The documentation for each tool has a row in the tbale caled "Python" that gives you the import statement. This is how we'll import how tools:
We've discussed the tools we're going to use. The documentation for each tool has a row in the table caled "Python" that gives you the import statement. This is how we'll import these tools:


```python
Expand Down Expand Up @@ -129,7 +131,7 @@ Workflow.step(
)
```

We provide a identifier for the step (unique amongst the other nodes in the workflow), and intialise our tool, passing our inputs of the step as parameters.
We provide an identifier for the step (unique amongst the other nodes in the workflow), and intialise our tool, passing our inputs of the step as parameters.

We can refer to an input (or previous result) using the dot notation. For example, to refer to the `fastq` input, we can use `w.fastq`.

Expand Down Expand Up @@ -212,7 +214,7 @@ w.output("out", source=w.sortsam.out)

## Workflow + Translation

Hopefully you have a workflow that looks like the following!
Hopefully now you have a workflow that looks like the following!

```python
from janis_core import WorkflowBuilder, String
Expand Down Expand Up @@ -272,33 +274,30 @@ janis translate tools/alignment.py wdl
We'll run the workflow against the current directory.

```bash
janis run -o . --engine cwltool \
janis run -o tutorial1 --engine cwltool \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intention is to keep tutorial 1 files within the tutorial1 directory. The alternative command in this tutorial creates tutorial1-run-with-cromwell, so after this change we will have both tutorial1-run-with-cromwell and tutorial1.

tools/alignment.py \
--fastq data/BRCA1_R*.fastq.gz \
--reference reference/hg38-brca1.fasta \
--sample_name NA12878 \
--read_group "@RG\tID:NA12878\tSM:NA12878\tLB:NA12878\tPL:ILLUMINA"
```

After the workflow has run, you'll see the outputs in the current directory:
After the workflow has run, you'll see the outputs in the tutorial1 directory:

```bash
ls
ls ~/janis/janis-tutorials/tutorial1

# drwxr-xr-x mfranklin 1677682026 160B data
# drwxr-xr-x mfranklin 1677682026 256B janis
# -rw-r--r-- mfranklin wheel 2.7M out.bam
# -rw-r--r-- mfranklin wheel 296B out.bam.bai
# drwxr-xr-x mfranklin 1677682026 320B reference
# drwxr-xr-x mfranklin 1677682026 128B tools
```

### OPTIONAL: Run with Cromwell

If you have `java` installed, Janis can run the workflow in the Crowmell execution engine by using the `--engine cromwell` parameter:

```bash
janis run -o run-with-cromwell --engine cromwell \
janis run -o tutorial1-run-with-cromwell --engine cromwell \
tools/alignment.py \
--fastq data/BRCA1_R*.fastq.gz \
--reference reference/hg38-brca1.fasta \
Expand Down
6 changes: 3 additions & 3 deletions docs/tutorials/tutorial2.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ ToolName = CommandToolBuilder(
Let's start by creating a file with this template inside a second output directory:

```bash
mkdir -p tools
cd ~/janis/janis-tutorials
vim tools/samtoolsflagstat.py
```

Expand Down Expand Up @@ -280,13 +280,13 @@ Jobs:
[✓] samtoolsflagstat (N/A)

Outputs:
- stats: $HOME/janis-tutorials/tutorial2/stats.txt
- stats: $HOME/janis-tutorials/tutorial2/stats
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was no stats.txt for me, only stats? Not sure if it was supposed to be .txt, maybe a different version of the tool used, or maybe I missed a step somewhere?

```

Janis (and CWLTool) said the tool executed correctly, let's check the output file:

```bash
cat tutorial2/stats.txt
cat tutorial2/stats
```

```
Expand Down
8 changes: 4 additions & 4 deletions docs/tutorials/tutorial3.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ This tutorial uses the workflow build in [Tutorial 1](https://janis.readthedocs.

## Output name

Simply put, `output_name` is the dervied filename of the output without the extension. By default, this is the `tag` of the output.
Simply put, `output_name` is the derived filename of the output without the extension. By default, this is the `tag` of the output.

You can specify a new output name in 2 ways:

Expand All @@ -30,17 +30,17 @@ You should make the following considerations:
- The input you select should be a string, or

- If the output you're naming is an array, the input you select should either be:
- singular
- singular or
- have the same number of elements in it.

Janis will either fall back to the first element if it's a list, or default to the output tag. This may cause outputs to override each other.
Janis will either fall back to the first element if it's a list, or default to the output tag. This may cause outputs to override each other.


## Output folder

Similar to the output name, the `output_folder` is folder, or group of nested folders into which your output will be written. By default, this field has no value and outputs are linked directly into the output directory.

If the output_folder field is an array, a nested folder is created for each element in ascending order (eg: `["parent", "child", "child_of_child"]`).
If the `output_folder` field is an array, a nested folder is created for each element in ascending order (eg: `["parent", "child", "child_of_child"]`).

There are multiple ways to specify output directories:

Expand Down
1 change: 0 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@
modules = ["janis_assistant." + p for p in sorted(find_packages("./janis_assistant"))]


fixed_unix_version = f"janis-pipelines.unix==" + JANIS_UNIX_VERSION
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already defined up in the same code block with the same value ☝️

setup(
name="janis pipelines",
version=__version__,
Expand Down