Skip to content

Commit

Permalink
Merge pull request #51 from DKFZ-ODCF/lsf-env-none-fix
Browse files Browse the repository at this point in the history
Lsf env none fix
  • Loading branch information
vinjana authored May 10, 2023
2 parents 5e18522 + 482a573 commit eb97a4f
Show file tree
Hide file tree
Showing 11 changed files with 96 additions and 68 deletions.
2 changes: 1 addition & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ version: 2.1
# See: https://circleci.com/docs/2.0/configuration-reference/#jobs
jobs:
integration-test:
resource_class: small
resource_class: medium
# Specify the execution environment. You can specify an image from Dockerhub or use one of our Convenience Images from CircleCI's Developer Hub.
# See: https://circleci.com/docs/2.0/configuration-reference/#docker-machine-macos-windows-executor
docker:
Expand Down
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,4 @@ cache/singularity/*
*.html.*
*~
*.sif
.circleci
.git
15 changes: 4 additions & 11 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ LABEL maintainer="Philip R. Kensche <[email protected]>"

# Capitalized versions for many tools. Minuscle version at least for apt.
ARG HTTP_PROXY=""
ARG http_proxy=""
ARG http_proxy="$HTTP_PROXY"
ARG HTTPS_PROXY=""
ARG https_proxy=""
ARG https_proxy="$HTTPS_PROXY"
ARG NO_PROXY=""
ARG no_proxy="$NO_PROXY"

Expand All @@ -32,7 +32,7 @@ RUN apt update && \
# For login Bash /etc/profile and ~/.profile is sourced. /etc/profile sources /etc/bash.bashrc.
# For non-login, interactive Bash /etc/bash.bashrc is sourced directly.
# For non-login, non-interactive Bash. We set BASH_ENV/ENV to /etc/bash.bashrc
# NOTE: ~/.bashrc could not be used, because when using because ~/ is /root/.
# NOTE: ~/.bashrc could not be used, because when using it, ~/ is /root/.
# Therefore /etc/bash.bashrc is used to use conda for all user IDs.
# NOTE: Conda should be fully available in non-login, interactive shell. Conda itself creates
# /etc/profile.d/conda.sh. The code that `conda init bash` writes to ~/.bashrc is moved
Expand All @@ -45,14 +45,7 @@ RUN grep "managed by 'conda init'" -A 100 ~/.bashrc >> /etc/container.bashrc &&
echo -e '\
set +u\n\
source activate nf-bam2fastq\n\
set -u\n\
export SAMTOOLS_BINARY=samtools\n\
export PICARD_BINARY=picard\n\
export JAVA_BINARY=java\n\
export MBUFFER_BINARY=mbuffer\n\
export CHECKSUM_BINARY=md5sum\n\
export PERL_BINARY=perl\n\
export BIOBAMBAM_BAM2FASTQ_BINARY=bamtofastq\n' >> /etc/container.bashrc && \
set -u\n\' >> /etc/container.bashrc && \
echo "source /etc/profile" > ~/.profile && \
cp ~/.profile /.profile && \
echo "source /etc/container.bashrc" >> /etc/bash.bashrc
Expand Down
27 changes: 16 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ Convert BAM files back to FASTQ.

## Quickstart with Conda

We do not recommend Conda for running the workflow. It may happen that packages are not available in any channels anymore and that the environment is broken. For reproducible research, please use containers.

Provided you have a working [Conda](https://docs.conda.io/en/latest/) installation, you can run the workflow with

```bash
Expand Down Expand Up @@ -124,14 +126,14 @@ Please refer to the [Nextflow documentation](https://www.nextflow.io/docs/latest

### Location of Environments

By default the Conda environments of the jobs as well as the Singularity containers are stored in subdirectories of the `cache/` subdirectory of the workflows installation directory (a.k.a `projectDir` by Nextflow). E.g. to use the Singularity container you can install the container as follows
By default, the Conda environments of the jobs as well as the Singularity containers are stored in subdirectories of the `cache/` subdirectory of the workflows installation directory (a.k.a `projectDir` by Nextflow). E.g. to use the Singularity container you can install the container as follows

```bash
cd $workflowRepoDir
# Refer to the nextflow.config for the name of the Singularity image.
singularity build \
cache/singularity/nf-bam2fastq_1.0.0.sif \
docker-daemon://ghcr.io/dkfz-odcf/nf-bam2fastq:latest
docker://ghcr.io/dkfz-odcf/nf-bam2fastq:1.0.0

# Test your container
test/test1.sh test-results/ singularity nextflowEnv/
Expand All @@ -151,6 +153,10 @@ test/test1.sh test-results/ $profile

This will create a test Conda environment in `test-results/nextflowEnv` and then run the tests. For the tests themselves you can use a local Conda environment or a Docker container, dependent on whether you set `$profile` to "conda" or "docker", respectively. These integration tests are also run in Travis CI.

### Continuous Delivery

For all commits with a tag that follows the pattern `\d+\.\d+\.\d+` the job containers are automatically pushed to [Github Container Registry](https://github.com/orgs/DKFZ-ODCF/packages) of the "ODCF" organization. Version tags should only be added to commits on the `master` branch, although currently no automatic rule enforces this.

### Manual container release

The container includes a Conda installation and is pretty big. It should only be released if its content is actually changed. For instance, it would be perfectly fine to have a workflow version 1.6.5 but still refer to an old container for 1.2.7.
Expand All @@ -159,7 +165,7 @@ This is an outline of the procedure to release the container to [Github Containe

1. Set the version that you want to release as variable. For the later commands you can set the Bash variable
```bash
versionTag=1.0.0
versionTag=1.2.0
```
2. Build the container.
```bash
Expand All @@ -170,32 +176,31 @@ This is an outline of the procedure to release the container to [Github Containe
--build-arg HTTPS_PROXY=$HTTPS_PROXY \
./
```
3. Edit the version-tag for the docker container in the "docker"-profile in the nextflow.config to match `$versionTag`.
3. Edit the version-tag for the docker container in the "docker"-profile in the `nextflow.config` to match `$versionTag`.
4. Run the integration test with the new container
```bash
test/test1.sh docker-test docker-test/test-environment docker
test/test1.sh docker-test docker
```
5. If the test succeeds, push the container to Github container registry. Set the CR_PAT variable to your personal access token (PAT):
```bash
echo $CR_PAT | docker login ghcr.io -u vinjana --password-stdin
docker image push ghcr.io/dkfz-odcf/nf-bam2fastq:$versionTag
```

### Continuous Delivery

For all commits with a tag that follows the pattern `\d+\.\d+\.\d+` the job containers are automatically pushed to [Github Container Registry](https://github.com/orgs/DKFZ-ODCF/packages) of the "ODCF" organization. Version tags should only be added to commits on the `master` branch, although currently no automatic rule enforces this.

## Release Notes

* upcoming
* 1.2.0
* Minor: Updated to miniconda3:4.10.3 base container, because the previous version (4.9.2) didn't build anymore.
* Minor: Use `-env none` for "lsf" cluster profile. Local environment should not be copied. This probably caused problems with the old "dkfzModules" environment profile.
* Patch: Require Nextflow >= 22.07.1, which fixes an LSF memory request bug. Added options for per-job memory requests to "lsf" profile in `nextflow.config`.
* Patch: Remove unnecessary `*_BINARY` variables in scripts. Binaries are fixed by Conda/containers.
* Patch: Needed to explicitly set `conda.enabled = True` with newer Nextflow

* 1.1.0 (February, 2022)
* Minor: Added `--publishMode` option to allow user to select the [Nextflow publish mode](https://www.nextflow.io/docs/latest/process.html#publishdir). Default: `rellink`. Note that the former default was `symlink`, but as this change is considered negligible we classified the change as "minor".
* Minor: Removed `dkfzModules` profile. Didn't work well and was originally only for development. Please use 'conda', 'singularity' or 'docker'. The container-based environments provide the best reproducibility.
* Patch: Switched from Travis to CircleCI for continuous integration.


* 1.0.1 (October 14., 2021)
* Patch: Fix memory calculation as exponential backoff
* Patch: Job names now contain workflow name and job/task hash. Run name seems currently not possible to include there (due to a possible bug in Nextflow).
Expand Down
5 changes: 3 additions & 2 deletions bin/bam2Fastq.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ printInfo
set -o pipefail
set -uvex


getFastqSuffix() {
if [[ "$compressFastqs" == true ]]; then
echo "fastq.gz"
Expand Down Expand Up @@ -66,7 +67,7 @@ processPairedEndWithReadGroups() {
##
mkdir -p "$outputDir"
local tempFile="$outputDir/$(basename "$bamFile").bamtofastq_tmp"
$BIOBAMBAM_BAM2FASTQ_BINARY \
bamtofastq \
filename="$bamFile" \
T="$tempFile" \
outputperreadgroup=1 \
Expand Down Expand Up @@ -94,7 +95,7 @@ ensureAllFiles() {


main() {
"$SAMTOOLS_BINARY" quickcheck "$bamFile"
samtools quickcheck "$bamFile"

outputDir=${outputDir:-$(basename "$bamFile")"_fastqs"}

Expand Down
6 changes: 3 additions & 3 deletions bin/workflowLib.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ mbuf() {
local bufferSize="$1"
shift
assertNonEmpty "$bufferSize" "No buffer size defined for mbuf()" || return $?
"$MBUFFER_BINARY" -m "$bufferSize" -q -l /dev/null $@
mbuffer -m "$bufferSize" -q -l /dev/null $@
}

lockFileName() {
Expand Down Expand Up @@ -84,7 +84,7 @@ assertNoDefaultReadGroup() {

getReadGroups() {
local bamFile="${1:?No BAM file given}"
declare -a groups=( $($SAMTOOLS_BINARY view -H "$bamFile" \
declare -a groups=( $(samtools view -H "$bamFile" \
| grep -P '^@RG\s' \
| perl -ne 's/^\@RG\s+ID:(\S+).*?$/$1/; print' \
2>/dev/null) )
Expand Down Expand Up @@ -261,7 +261,7 @@ fastqLinearize() {
}

fastqDelinearize() {
"$PERL_BINARY" -aF\\t -lne '$F[0] =~ s/^(\S+?)(?:\/\d)?(?:\s+.*)?$/$1/o; print join("\n", @F)'
perl -aF\\t -lne '$F[0] =~ s/^(\S+?)(?:\/\d)?(?:\s+.*)?$/$1/o; print join("\n", @F)'
}

sortLinearizedFastqStream() {
Expand Down
Empty file removed cache/conda/.keep
Empty file.
47 changes: 17 additions & 30 deletions nextflow.config
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* Copyright (c) 2021 DKFZ.
* Copyright (c) 2022 DKFZ.
*
* Distributed under the MIT License (license terms are at https://github.com/DKFZ-ODCF/nf-bam2fastq/blob/master/LICENSE.txt).
*
Expand All @@ -8,16 +8,17 @@
* Configuration for the DKFZ-ODCF/nf-bam2fastq Nextflow workflow.
*/

nextflowVersion = '>= 20.04.1.5335'

manifest {
homePage = 'https://github.com/DKFZ-ODCF/nf-bam2fastq'
description = 'BAM-to-FASTQ conversion and FASTQ-sorting workflow'
mainScript = 'main.nf'
version = '1.1.0'
version = '1.2.0'
author = 'Philip Reiner Kensche'
nextflowVersion = '>= 22.07.1'
}

// The workflow may refer to an older container version, e.g. if the container was not updated.
ext.containerVersion = '1.0.0'

profiles {

Expand All @@ -35,52 +36,32 @@ profiles {
}

conda {

conda.enabled = true
conda.cacheDir = "${projectDir}/cache/conda"
process {
conda = "${projectDir}/task-environment.yml"

beforeScript = """
export SAMTOOLS_BINARY=samtools
export PICARD_BINARY=picard
export JAVA_BINARY=java
export MBUFFER_BINARY=mbuffer
export CHECKSUM_BINARY=md5sum
export PERL_BINARY=perl
export BIOBAMBAM_BAM2FASTQ_BINARY=bamtofastq
"""
}
conda.cacheDir = "${projectDir}/cache/conda"
}

mamba {
conda.enabled = true
useMamba = true

conda.cacheDir = "${projectDir}/cache/conda"
process {
conda = "${projectDir}/task-environment.yml"

beforeScript = """
export SAMTOOLS_BINARY=samtools
export PICARD_BINARY=picard
export JAVA_BINARY=java
export MBUFFER_BINARY=mbuffer
export CHECKSUM_BINARY=md5sum
export PERL_BINARY=perl
export BIOBAMBAM_BAM2FASTQ_BINARY=bamtofastq
"""
}
conda.cacheDir = "${projectDir}/cache/conda"
}

docker {
docker.enabled = true
docker.runOptions='-u $(id -u):$(id -g)'
process {
container = 'ghcr.io/dkfz-odcf/nf-bam2fastq:1.0.0'
container = "ghcr.io/dkfz-odcf/nf-bam2fastq:${ext.containerVersion}"
}
}

singularity {
process.container = 'nf-bam2fastq_1.0.0.sif'
process.container = "nf-bam2fastq_${ext.containerVersion}.sif"
singularity.enabled = true
singularity.cacheDir = "${projectDir}/cache/singularity"
// The singularity containers are stored in the workflow-directory
Expand All @@ -90,6 +71,12 @@ profiles {
lsf {
process {
executor = 'lsf'
clusterOptions = '-env none'
}
executor {
// scratch = '$SCRATCHDIR/$LSB_JOBID'
perTaskReserve = false
perJobMemLimit = true
}
}

Expand Down
2 changes: 2 additions & 0 deletions nf-bam2fastq.iml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
<excludeFolder url="file://$MODULE_DIR$/cache/conda/nf-bam2fastq-3e98300235b5aed9f3835e00669fb59f" />
<excludeFolder url="file://$MODULE_DIR$/test/output-all" />
<excludeFolder url="file://$MODULE_DIR$/.nextflow" />
<excludeFolder url="file://$MODULE_DIR$/cache" />
<excludeFolder url="file://$MODULE_DIR$/nextflowEnv" />
</content>
<orderEntry type="inheritedJdk" />
<orderEntry type="sourceFolder" forTests="false" />
Expand Down
54 changes: 46 additions & 8 deletions task-environment.yml
Original file line number Diff line number Diff line change
@@ -1,13 +1,51 @@
name: nf-bam2fastq
name: nf-bam2fastq
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- bash=5.0.018
- biobambam=2.0.179
- mbuffer=20160228
- perl=5.22.2.1
- pigz=2.3.4
- python=3.6
- samtools=1.11
- _libgcc_mutex=0.1=conda_forge
- _openmp_mutex=4.5=1_gnu
- bash=5.0.018=h0a1914f_0
- biobambam=2.0.179=h7d875b9_1
- boost-cpp=1.74.0=h312852a_4
- bzip2=1.0.8=h7f98852_4
- c-ares=1.17.1=h7f98852_1
- ca-certificates=2021.5.30=ha878542_0
- certifi=2021.5.30=py36h5fab9bb_0
- curl=7.77.0=hea6ffbf_0
- htslib=1.11=hd3b49d5_2
- icu=68.1=h58526e2_0
- krb5=1.19.1=hcc1bbae_0
- ld_impl_linux-64=2.35.1=hea4e1c9_2
- libcurl=7.77.0=h2574ce0_0
- libdeflate=1.7=h7f98852_5
- libedit=3.1.20191231=he28a2e2_2
- libev=4.33=h516909a_1
- libffi=3.3=h58526e2_2
- libgcc-ng=9.3.0=h2828fa1_19
- libgomp=9.3.0=h2828fa1_19
- libmaus2=2.0.777=h6eb57d2_0
- libnghttp2=1.43.0=h812cca2_0
- libssh2=1.9.0=ha56f1ee_6
- libstdcxx-ng=9.3.0=h6de172a_19
- lz4-c=1.9.3=h9c3ff4c_0
- mbuffer=20160228=h779adbc_3
- ncurses=6.2=h58526e2_4
- openssl=1.1.1k=h7f98852_0
- perl=5.22.2.1=0
- pigz=2.3.4=hed695b0_1
- pip=21.1.2=pyhd8ed1ab_0
- python=3.6.13=hffdb5ce_0_cpython
- python_abi=3.6=1_cp36m
- readline=8.1=h46c0cb4_0
- samtools=1.11=h6270b1f_0
- setuptools=49.6.0=py36h5fab9bb_3
- snappy=1.1.8=he1b5a44_3
- sqlite=3.35.5=h74cdb3f_0
- staden_io_lib=1.14.14=h7c09d56_1
- tk=8.6.10=h21135ba_1
- wheel=0.36.2=pyhd3deb0d_0
- xz=5.2.5=h516909a_1
- zlib=1.2.11=h516909a_1010
- zstd=1.5.0=ha95c52a_0
5 changes: 4 additions & 1 deletion test-environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ channels:
- bioconda
- defaults
dependencies:
- mamba
- bash=5.0.018
- samtools=1.11
- nextflow=20.10.0
- nextflow=22.10.1
- gradle=7.4.2
- openjdk=11.0.15

0 comments on commit eb97a4f

Please sign in to comment.