diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 4c2ba75..f9a00e4 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -1,4 +1,4 @@ -# nf-core/fastqrepair: Contributing Guidelines +# `nf-core/fastqrepair`: Contributing Guidelines Hi there! Many thanks for taking an interest in improving nf-core/fastqrepair. @@ -19,7 +19,7 @@ If you'd like to write some code for nf-core/fastqrepair, the standard workflow 1. Check that there isn't already an issue about your idea in the [nf-core/fastqrepair issues](https://github.com/nf-core/fastqrepair/issues) to avoid duplicating work. If there isn't one already, please create one so that others know you're working on this 2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/fastqrepair repository](https://github.com/nf-core/fastqrepair) to your GitHub account 3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions) -4. Use `nf-core schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). +4. Use `nf-core pipelines schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). 5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/). @@ -40,7 +40,7 @@ There are typically two types of tests that run: ### Lint tests `nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. -To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint ` command. +To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core pipelines lint ` command. If any failures or warnings are encountered, please follow the listed URL for more documentation. @@ -55,9 +55,9 @@ These tests are run both with the latest available version of `Nextflow` and als :warning: Only in the unlikely and regretful event of a release happening with a bug. -- On your own fork, make a new branch `patch` based on `upstream/master`. +- On your own fork, make a new branch `patch` based on `upstream/main` or `upstream/master`. - Fix the bug, and bump version (X.Y.Z+1). -- A PR should be made on `master` from patch to directly this particular bug. +- Open a pull-request from `patch` to `main`/`master` with the changes. ## Getting help @@ -65,17 +65,17 @@ For further information/help, please consult the [nf-core/fastqrepair documentat ## Pipeline contribution conventions -To make the nf-core/fastqrepair code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written. +To make the `nf-core/fastqrepair` code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written. ### Adding a new step If you wish to contribute a new step, please use the following coding standards: -1. Define the corresponding input channel into your new process from the expected previous process channel +1. Define the corresponding input channel into your new process from the expected previous process channel. 2. Write the process block (see below). 3. Define the output channel if needed (see below). 4. Add any new parameters to `nextflow.config` with a default (see below). -5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core schema build` tool). +5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core pipelines schema build` tool). 6. Add sanity checks and validation for all relevant parameters. 7. Perform local tests to validate that the new code works as expected. 8. If applicable, add a new test command in `.github/workflow/ci.yml`. @@ -84,13 +84,13 @@ If you wish to contribute a new step, please use the following coding standards: ### Default values -Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope. +Parameters should be initialised / defined with default values within the `params` scope in `nextflow.config`. -Once there, use `nf-core schema build` to add to `nextflow_schema.json`. +Once there, use `nf-core pipelines schema build` to add to `nextflow_schema.json`. ### Default processes resource requirements -Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. +Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/main/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. The process resources can be passed on to the tool dynamically within the process with the `${task.cpus}` and `${task.memory}` variables in the `script:` block. @@ -103,7 +103,7 @@ Please use the following naming schemes, to make it easy to understand what is g ### Nextflow version bumping -If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]` +If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core pipelines bump-version --nextflow . [min-nf-version]` ### Images and figures diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml index fd434ef..f4a807b 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.yml +++ b/.github/ISSUE_TEMPLATE/bug_report.yml @@ -9,7 +9,6 @@ body: - [nf-core website: troubleshooting](https://nf-co.re/usage/troubleshooting) - [nf-core/fastqrepair pipeline documentation](https://nf-co.re/fastqrepair/usage) - - type: textarea id: description attributes: diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 2353e89..353abae 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -17,7 +17,7 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/fast - [ ] If you've fixed a bug or added code that should be tested, add tests! - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/fastqrepair/tree/master/.github/CONTRIBUTING.md) - [ ] If necessary, also make a PR on the nf-core/fastqrepair _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository. -- [ ] Make sure your code lints (`nf-core lint`). +- [ ] Make sure your code lints (`nf-core pipelines lint`). - [ ] Ensure the test suite passes (`nextflow run . -profile test,docker --outdir `). - [ ] Check for unexpected warnings in debug mode (`nextflow run . -profile debug,test,docker --outdir `). - [ ] Usage Documentation in `docs/usage.md` is updated. diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml index a1562b5..537fcb2 100644 --- a/.github/workflows/awsfulltest.yml +++ b/.github/workflows/awsfulltest.yml @@ -1,18 +1,48 @@ name: nf-core AWS full size tests -# This workflow is triggered on published releases. +# This workflow is triggered on PRs opened against the main/master branch. # It can be additionally triggered manually with GitHub actions workflow dispatch button. # It runs the -profile 'test_full' on AWS batch on: - release: - types: [published] + pull_request: + branches: + - main + - master workflow_dispatch: + pull_request_review: + types: [submitted] + jobs: run-platform: name: Run AWS full tests - if: github.repository == 'nf-core/fastqrepair' + # run only if the PR is approved by at least 2 reviewers and against the master branch or manually triggered + if: github.repository == 'nf-core/fastqrepair' && github.event.review.state == 'approved' && github.event.pull_request.base.ref == 'master' || github.event_name == 'workflow_dispatch' runs-on: ubuntu-latest steps: + - name: Get PR reviews + uses: octokit/request-action@v2.x + if: github.event_name != 'workflow_dispatch' + id: check_approvals + continue-on-error: true + with: + route: GET /repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/reviews?per_page=100 + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + + - name: Check for approvals + if: ${{ failure() && github.event_name != 'workflow_dispatch' }} + run: | + echo "No review approvals found. At least 2 approvals are required to run this action automatically." + exit 1 + + - name: Check for enough approvals (>=2) + id: test_variables + if: github.event_name != 'workflow_dispatch' + run: | + JSON_RESPONSE='${{ steps.check_approvals.outputs.data }}' + CURRENT_APPROVALS_COUNT=$(echo $JSON_RESPONSE | jq -c '[.[] | select(.state | contains("APPROVED")) ] | length') + test $CURRENT_APPROVALS_COUNT -ge 2 || exit 1 # At least 2 approvals are required + - name: Launch workflow via Seqera Platform uses: seqeralabs/action-tower-launch@v2 # TODO nf-core: You can customise AWS full pipeline tests as required diff --git a/.github/workflows/branch.yml b/.github/workflows/branch.yml index ca43fd3..7cf9e8d 100644 --- a/.github/workflows/branch.yml +++ b/.github/workflows/branch.yml @@ -1,15 +1,17 @@ name: nf-core branch protection -# This workflow is triggered on PRs to master branch on the repository -# It fails when someone tries to make a PR against the nf-core `master` branch instead of `dev` +# This workflow is triggered on PRs to `main`/`master` branch on the repository +# It fails when someone tries to make a PR against the nf-core `main`/`master` branch instead of `dev` on: pull_request_target: - branches: [master] + branches: + - main + - master jobs: test: runs-on: ubuntu-latest steps: - # PRs to the nf-core repo master branch are only ok if coming from the nf-core repo `dev` or any `patch` branches + # PRs to the nf-core repo main/master branch are only ok if coming from the nf-core repo `dev` or any `patch` branches - name: Check PRs if: github.repository == 'nf-core/fastqrepair' run: | @@ -22,7 +24,7 @@ jobs: uses: mshick/add-pr-comment@b8f338c590a895d50bcbfa6c5859251edc8952fc # v2 with: message: | - ## This PR is against the `master` branch :x: + ## This PR is against the `${{github.event.pull_request.base.ref}}` branch :x: * Do not close this PR * Click _Edit_ and change the `base` to `dev` @@ -32,9 +34,9 @@ jobs: Hi @${{ github.event.pull_request.user.login }}, - It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `master` branch. - The `master` branch on nf-core repositories should always contain code from the latest release. - Because of this, PRs to `master` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch. + It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) ${{github.event.pull_request.base.ref}} branch. + The ${{github.event.pull_request.base.ref}} branch on nf-core repositories should always contain code from the latest release. + Because of this, PRs to ${{github.event.pull_request.base.ref}} are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch. You do not need to close this PR, you can change the target branch to `dev` by clicking the _"Edit"_ button at the top of this page. Note that even after this, the test will continue to show as failing until you push a new commit. diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 226735c..990a691 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -7,9 +7,12 @@ on: pull_request: release: types: [published] + workflow_dispatch: env: NXF_ANSI_LOG: false + NXF_SINGULARITY_CACHEDIR: ${{ github.workspace }}/.singularity + NXF_SINGULARITY_LIBRARYDIR: ${{ github.workspace }}/.singularity concurrency: group: "${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}" @@ -17,30 +20,68 @@ concurrency: jobs: test: - name: Run pipeline with test data + name: "Run pipeline with test data (${{ matrix.NXF_VER }} | ${{ matrix.test_name }} | ${{ matrix.profile }})" # Only run on push if this is the nf-core dev branch (merged PRs) if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/fastqrepair') }}" runs-on: ubuntu-latest strategy: matrix: NXF_VER: - - "23.04.0" + - "24.04.2" - "latest-everything" + profile: + - "conda" + - "docker" + - "singularity" + test_name: + - "test" + isMaster: + - ${{ github.base_ref == 'master' }} + # Exclude conda and singularity on dev + exclude: + - isMaster: false + profile: "conda" + - isMaster: false + profile: "singularity" steps: - name: Check out pipeline code - uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 + uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4 + with: + fetch-depth: 0 - - name: Install Nextflow + - name: Set up Nextflow uses: nf-core/setup-nextflow@v2 with: version: "${{ matrix.NXF_VER }}" - - name: Disk space cleanup + - name: Set up Apptainer + if: matrix.profile == 'singularity' + uses: eWaterCycle/setup-apptainer@main + + - name: Set up Singularity + if: matrix.profile == 'singularity' + run: | + mkdir -p $NXF_SINGULARITY_CACHEDIR + mkdir -p $NXF_SINGULARITY_LIBRARYDIR + + - name: Set up Miniconda + if: matrix.profile == 'conda' + uses: conda-incubator/setup-miniconda@a4260408e20b96e80095f42ff7f1a15b27dd94ca # v3 + with: + miniconda-version: "latest" + auto-update-conda: true + conda-solver: libmamba + channels: conda-forge,bioconda + + - name: Set up Conda + if: matrix.profile == 'conda' + run: | + echo $(realpath $CONDA)/condabin >> $GITHUB_PATH + echo $(realpath python) >> $GITHUB_PATH + + - name: Clean up Disk space uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1 - - name: Run pipeline with test data - # TODO nf-core: You can customise CI pipeline run tests as required - # For example: adding multiple test runs with different parameters - # Remember that you can parallelise this by using strategy.matrix + - name: "Run pipeline with test data ${{ matrix.NXF_VER }} | ${{ matrix.test_name }} | ${{ matrix.profile }}" run: | - nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results + nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.test_name }},${{ matrix.profile }} --outdir ./results diff --git a/.github/workflows/download_pipeline.yml b/.github/workflows/download_pipeline.yml index 2d20d64..ab06316 100644 --- a/.github/workflows/download_pipeline.yml +++ b/.github/workflows/download_pipeline.yml @@ -1,14 +1,14 @@ -name: Test successful pipeline download with 'nf-core download' +name: Test successful pipeline download with 'nf-core pipelines download' # Run the workflow when: # - dispatched manually -# - when a PR is opened or reopened to master branch +# - when a PR is opened or reopened to main/master branch # - the head branch of the pull request is updated, i.e. if fixes for a release are pushed last minute to dev. on: workflow_dispatch: inputs: testbranch: - description: "The specific branch you wish to utilize for the test execution of nf-core download." + description: "The specific branch you wish to utilize for the test execution of nf-core pipelines download." required: true default: "dev" pull_request: @@ -17,17 +17,34 @@ on: - edited - synchronize branches: + - main - master pull_request_target: branches: + - main - master env: NXF_ANSI_LOG: false jobs: + configure: + runs-on: ubuntu-latest + outputs: + REPO_LOWERCASE: ${{ steps.get_repo_properties.outputs.REPO_LOWERCASE }} + REPOTITLE_LOWERCASE: ${{ steps.get_repo_properties.outputs.REPOTITLE_LOWERCASE }} + REPO_BRANCH: ${{ steps.get_repo_properties.outputs.REPO_BRANCH }} + steps: + - name: Get the repository name and current branch + id: get_repo_properties + run: | + echo "REPO_LOWERCASE=${GITHUB_REPOSITORY,,}" >> "$GITHUB_OUTPUT" + echo "REPOTITLE_LOWERCASE=$(basename ${GITHUB_REPOSITORY,,})" >> "$GITHUB_OUTPUT" + echo "REPO_BRANCH=${{ github.event.inputs.testbranch || 'dev' }}" >> "$GITHUB_OUTPUT" + download: runs-on: ubuntu-latest + needs: configure steps: - name: Install Nextflow uses: nf-core/setup-nextflow@v2 @@ -35,52 +52,83 @@ jobs: - name: Disk space cleanup uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1 - - uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5 + - uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5 with: python-version: "3.12" architecture: "x64" - - uses: eWaterCycle/setup-singularity@931d4e31109e875b13309ae1d07c70ca8fbc8537 # v7 + + - name: Setup Apptainer + uses: eWaterCycle/setup-apptainer@4bb22c52d4f63406c49e94c804632975787312b3 # v2.0.0 with: - singularity-version: 3.8.3 + apptainer-version: 1.3.4 - name: Install dependencies run: | python -m pip install --upgrade pip pip install git+https://github.com/nf-core/tools.git@dev - - name: Get the repository name and current branch set as environment variable + - name: Make a cache directory for the container images run: | - echo "REPO_LOWERCASE=${GITHUB_REPOSITORY,,}" >> ${GITHUB_ENV} - echo "REPOTITLE_LOWERCASE=$(basename ${GITHUB_REPOSITORY,,})" >> ${GITHUB_ENV} - echo "REPO_BRANCH=${{ github.event.inputs.testbranch || 'dev' }}" >> ${GITHUB_ENV} + mkdir -p ./singularity_container_images - name: Download the pipeline env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images run: | - nf-core download ${{ env.REPO_LOWERCASE }} \ - --revision ${{ env.REPO_BRANCH }} \ - --outdir ./${{ env.REPOTITLE_LOWERCASE }} \ + nf-core pipelines download ${{ needs.configure.outputs.REPO_LOWERCASE }} \ + --revision ${{ needs.configure.outputs.REPO_BRANCH }} \ + --outdir ./${{ needs.configure.outputs.REPOTITLE_LOWERCASE }} \ --compress "none" \ --container-system 'singularity' \ - --container-library "quay.io" -l "docker.io" -l "ghcr.io" \ + --container-library "quay.io" -l "docker.io" -l "community.wave.seqera.io/library/" \ --container-cache-utilisation 'amend' \ - --download-configuration + --download-configuration 'yes' - name: Inspect download - run: tree ./${{ env.REPOTITLE_LOWERCASE }} + run: tree ./${{ needs.configure.outputs.REPOTITLE_LOWERCASE }} + + - name: Inspect container images + run: tree ./singularity_container_images | tee ./container_initial + + - name: Count the downloaded number of container images + id: count_initial + run: | + image_count=$(ls -1 ./singularity_container_images | wc -l | xargs) + echo "Initial container image count: $image_count" + echo "IMAGE_COUNT_INITIAL=$image_count" >> "$GITHUB_OUTPUT" - name: Run the downloaded pipeline (stub) id: stub_run_pipeline continue-on-error: true env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images NXF_SINGULARITY_HOME_MOUNT: true - run: nextflow run ./${{ env.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ env.REPO_BRANCH }}) -stub -profile test,singularity --outdir ./results + run: nextflow run ./${{needs.configure.outputs.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ needs.configure.outputs.REPO_BRANCH }}) -stub -profile test,singularity --outdir ./results - name: Run the downloaded pipeline (stub run not supported) id: run_pipeline - if: ${{ job.steps.stub_run_pipeline.status == failure() }} + if: ${{ steps.stub_run_pipeline.outcome == 'failure' }} env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images NXF_SINGULARITY_HOME_MOUNT: true - run: nextflow run ./${{ env.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ env.REPO_BRANCH }}) -profile test,singularity --outdir ./results + run: nextflow run ./${{ needs.configure.outputs.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ needs.configure.outputs.REPO_BRANCH }}) -profile test,singularity --outdir ./results + + - name: Count the downloaded number of container images + id: count_afterwards + run: | + image_count=$(ls -1 ./singularity_container_images | wc -l | xargs) + echo "Post-pipeline run container image count: $image_count" + echo "IMAGE_COUNT_AFTER=$image_count" >> "$GITHUB_OUTPUT" + + - name: Compare container image counts + run: | + if [ "${{ steps.count_initial.outputs.IMAGE_COUNT_INITIAL }}" -ne "${{ steps.count_afterwards.outputs.IMAGE_COUNT_AFTER }}" ]; then + initial_count=${{ steps.count_initial.outputs.IMAGE_COUNT_INITIAL }} + final_count=${{ steps.count_afterwards.outputs.IMAGE_COUNT_AFTER }} + difference=$((final_count - initial_count)) + echo "$difference additional container images were \n downloaded at runtime . The pipeline has no support for offline runs!" + tree ./singularity_container_images > ./container_afterwards + diff ./container_initial ./container_afterwards + exit 1 + else + echo "The pipeline can be downloaded successfully!" + fi diff --git a/.github/workflows/fix-linting.yml b/.github/workflows/fix-linting.yml index a1a6664..a1b4e7b 100644 --- a/.github/workflows/fix-linting.yml +++ b/.github/workflows/fix-linting.yml @@ -13,7 +13,7 @@ jobs: runs-on: ubuntu-latest steps: # Use the @nf-core-bot token to check out so we can push later - - uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 + - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4 with: token: ${{ secrets.nf_core_bot_auth_token }} @@ -32,7 +32,7 @@ jobs: GITHUB_TOKEN: ${{ secrets.nf_core_bot_auth_token }} # Install and run pre-commit - - uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5 + - uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5 with: python-version: "3.12" diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml index 1fcafe8..dbd52d5 100644 --- a/.github/workflows/linting.yml +++ b/.github/workflows/linting.yml @@ -1,6 +1,6 @@ name: nf-core linting # This workflow is triggered on pushes and PRs to the repository. -# It runs the `nf-core lint` and markdown lint tests to ensure +# It runs the `nf-core pipelines lint` and markdown lint tests to ensure # that the code meets the nf-core guidelines. on: push: @@ -14,10 +14,10 @@ jobs: pre-commit: runs-on: ubuntu-latest steps: - - uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 + - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4 - name: Set up Python 3.12 - uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5 + uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5 with: python-version: "3.12" @@ -31,27 +31,42 @@ jobs: runs-on: ubuntu-latest steps: - name: Check out pipeline code - uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 + uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4 - name: Install Nextflow uses: nf-core/setup-nextflow@v2 - - uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5 + - uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5 with: python-version: "3.12" architecture: "x64" + - name: read .nf-core.yml + uses: pietrobolcato/action-read-yaml@1.1.0 + id: read_yml + with: + config: ${{ github.workspace }}/.nf-core.yml + - name: Install dependencies run: | python -m pip install --upgrade pip - pip install nf-core + pip install nf-core==${{ steps.read_yml.outputs['nf_core_version'] }} + + - name: Run nf-core pipelines lint + if: ${{ github.base_ref != 'master' }} + env: + GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} + run: nf-core -l lint_log.txt pipelines lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md - - name: Run nf-core lint + - name: Run nf-core pipelines lint --release + if: ${{ github.base_ref == 'master' }} env: GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} - run: nf-core -l lint_log.txt lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md + run: nf-core -l lint_log.txt pipelines lint --release --dir ${GITHUB_WORKSPACE} --markdown lint_results.md - name: Save PR number if: ${{ always() }} @@ -59,7 +74,7 @@ jobs: - name: Upload linting log file artifact if: ${{ always() }} - uses: actions/upload-artifact@65462800fd760344b1a7b4382951275a0abb4808 # v4 + uses: actions/upload-artifact@b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882 # v4 with: name: linting-logs path: | diff --git a/.github/workflows/linting_comment.yml b/.github/workflows/linting_comment.yml index 40acc23..95b6b6a 100644 --- a/.github/workflows/linting_comment.yml +++ b/.github/workflows/linting_comment.yml @@ -11,7 +11,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Download lint results - uses: dawidd6/action-download-artifact@09f2f74827fd3a8607589e5ad7f9398816f540fe # v3 + uses: dawidd6/action-download-artifact@20319c5641d495c8a52e688b7dc5fada6c3a9fbc # v8 with: workflow: linting.yml workflow_conclusion: completed diff --git a/.github/workflows/release-announcements.yml b/.github/workflows/release-announcements.yml index 03ecfcf..76a9e67 100644 --- a/.github/workflows/release-announcements.yml +++ b/.github/workflows/release-announcements.yml @@ -12,7 +12,7 @@ jobs: - name: get topics and convert to hashtags id: get_topics run: | - echo "topics=$(curl -s https://nf-co.re/pipelines.json | jq -r '.remote_workflows[] | select(.full_name == "${{ github.repository }}") | .topics[]' | awk '{print "#"$0}' | tr '\n' ' ')" >> $GITHUB_OUTPUT + echo "topics=$(curl -s https://nf-co.re/pipelines.json | jq -r '.remote_workflows[] | select(.full_name == "${{ github.repository }}") | .topics[]' | awk '{print "#"$0}' | tr '\n' ' ')" | sed 's/-//g' >> $GITHUB_OUTPUT - uses: rzr/fediverse-action@master with: @@ -27,39 +27,6 @@ jobs: ${{ steps.get_topics.outputs.topics }} #nfcore #openscience #nextflow #bioinformatics - send-tweet: - runs-on: ubuntu-latest - - steps: - - uses: actions/setup-python@82c7e631bb3cdc910f68e0081d67478d79c6982d # v5 - with: - python-version: "3.10" - - name: Install dependencies - run: pip install tweepy==4.14.0 - - name: Send tweet - shell: python - run: | - import os - import tweepy - - client = tweepy.Client( - access_token=os.getenv("TWITTER_ACCESS_TOKEN"), - access_token_secret=os.getenv("TWITTER_ACCESS_TOKEN_SECRET"), - consumer_key=os.getenv("TWITTER_CONSUMER_KEY"), - consumer_secret=os.getenv("TWITTER_CONSUMER_SECRET"), - ) - tweet = os.getenv("TWEET") - client.create_tweet(text=tweet) - env: - TWEET: | - Pipeline release! ${{ github.repository }} v${{ github.event.release.tag_name }} - ${{ github.event.release.name }}! - - Please see the changelog: ${{ github.event.release.html_url }} - TWITTER_CONSUMER_KEY: ${{ secrets.TWITTER_CONSUMER_KEY }} - TWITTER_CONSUMER_SECRET: ${{ secrets.TWITTER_CONSUMER_SECRET }} - TWITTER_ACCESS_TOKEN: ${{ secrets.TWITTER_ACCESS_TOKEN }} - TWITTER_ACCESS_TOKEN_SECRET: ${{ secrets.TWITTER_ACCESS_TOKEN_SECRET }} - bsky-post: runs-on: ubuntu-latest steps: diff --git a/.github/workflows/template_version_comment.yml b/.github/workflows/template_version_comment.yml new file mode 100644 index 0000000..537529b --- /dev/null +++ b/.github/workflows/template_version_comment.yml @@ -0,0 +1,46 @@ +name: nf-core template version comment +# This workflow is triggered on PRs to check if the pipeline template version matches the latest nf-core version. +# It posts a comment to the PR, even if it comes from a fork. + +on: pull_request_target + +jobs: + template_version: + runs-on: ubuntu-latest + steps: + - name: Check out pipeline code + uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4 + with: + ref: ${{ github.event.pull_request.head.sha }} + + - name: Read template version from .nf-core.yml + uses: nichmor/minimal-read-yaml@v0.0.2 + id: read_yml + with: + config: ${{ github.workspace }}/.nf-core.yml + + - name: Install nf-core + run: | + python -m pip install --upgrade pip + pip install nf-core==${{ steps.read_yml.outputs['nf_core_version'] }} + + - name: Check nf-core outdated + id: nf_core_outdated + run: echo "OUTPUT=$(pip list --outdated | grep nf-core)" >> ${GITHUB_ENV} + + - name: Post nf-core template version comment + uses: mshick/add-pr-comment@b8f338c590a895d50bcbfa6c5859251edc8952fc # v2 + if: | + contains(env.OUTPUT, 'nf-core') + with: + repo-token: ${{ secrets.NF_CORE_BOT_AUTH_TOKEN }} + allow-repeats: false + message: | + > [!WARNING] + > Newer version of the nf-core template is available. + > + > Your pipeline is using an old version of the nf-core template: ${{ steps.read_yml.outputs['nf_core_version'] }}. + > Please update your pipeline to the latest version. + > + > For more documentation on how to update your pipeline, please see the [nf-core documentation](https://github.com/nf-core/tools?tab=readme-ov-file#sync-a-pipeline-with-the-template) and [Synchronisation documentation](https://nf-co.re/docs/contributing/sync). + # diff --git a/.gitignore b/.gitignore index 5124c9a..5f6e3fd 100644 --- a/.gitignore +++ b/.gitignore @@ -6,3 +6,9 @@ results/ testing/ testing* *.pyc +null/ +dataset/ +.devcontainer/dev/ +.vscode/ +.nf-test/ +.nf-test.log diff --git a/.gitpod.yml b/.gitpod.yml index 105a182..83599f6 100644 --- a/.gitpod.yml +++ b/.gitpod.yml @@ -4,17 +4,7 @@ tasks: command: | pre-commit install --install-hooks nextflow self-update - - name: unset JAVA_TOOL_OPTIONS - command: | - unset JAVA_TOOL_OPTIONS vscode: - extensions: # based on nf-core.nf-core-extensionpack - - esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code - - EditorConfig.EditorConfig # override user/workspace settings with settings found in .editorconfig files - - Gruntfuggly.todo-tree # Display TODO and FIXME in a tree view in the activity bar - - mechatroner.rainbow-csv # Highlight columns in csv files in different colors - # - nextflow.nextflow # Nextflow syntax highlighting - - oderwat.indent-rainbow # Highlight indentation level - - streetsidesoftware.code-spell-checker # Spelling checker for source code - - charliermarsh.ruff # Code linter Ruff + extensions: + - nf-core.nf-core-extensionpack # https://github.com/nf-core/vscode-extensionpack diff --git a/.nf-core.yml b/.nf-core.yml index 58e3b5d..210f9a2 100644 --- a/.nf-core.yml +++ b/.nf-core.yml @@ -1,6 +1,14 @@ -nf_core_version: 2.14.1 +nf_core_version: 3.2.0 repository_type: pipeline template: - prefix: nf-core - skip: - - igenomes + author: Tommaso Mazza + description: A pipeline to recover corrupted FASTQ files, drop or fix pesky lines, + remove unpaired reads, and settle reads interleaving + force: false + is_nfcore: true + name: fastqrepair + org: nf-core + outdir: . + skip_features: + - igenomes + version: 1.0.0 diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 4dc0f1d..1dec865 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -7,7 +7,7 @@ repos: - prettier@3.2.5 - repo: https://github.com/editorconfig-checker/editorconfig-checker.python - rev: "2.7.3" + rev: "3.1.2" hooks: - id: editorconfig-checker alias: ec diff --git a/.prettierignore b/.prettierignore index 437d763..edd29f0 100644 --- a/.prettierignore +++ b/.prettierignore @@ -10,3 +10,4 @@ testing/ testing* *.pyc bin/ +ro-crate-metadata.json diff --git a/CHANGELOG.md b/CHANGELOG.md index f54acff..a23c9e9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,14 +3,43 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## v1.0dev - [date] +## v[1.0.0](https://github.com/nf-core/fastqrepair/releases/tag/1.0.0) - Catanzaro YellowRed [04/02/2025] Initial release of nf-core/fastqrepair, created with the [nf-core](https://nf-co.re/) template. -### `Added` + ### `Fixed` +- [PR #2](https://github.com/nf-core/fastqrepair/pull/2) - First release + ### `Dependencies` -### `Deprecated` +| Dependency | Old version | New version | +| ------------ | ----------- | ----------- | +| `BBmap` | | 39.13 | +| `FastQC` | | 0.12.1 | +| `gzrt` | | 0.9.1 | +| `Wipertools` | | 1.1.5 | +| `MultiQC` | | 1.26 | + +> **NB:** Dependency has been **updated** if both old and new version information is present. +> +> **NB:** Dependency has been **added** if just the new version information is present. +> +> **NB:** Dependency has been **removed** if new version information isn't present. + +### Credits + +Special thanks to the following for their contributions to the release: + +- [Sateesh Peri](https://github.com/sateeshperi) +- [Louis Le Nézet](https://github.com/LouisLeNezet) - reviewer +- [Anabella Trigila](https://github.com/atrigila) - reviewer +- [James A. Fellows Yates](https://github.com/jfy133) - reviewer +- [Charles Plessy](https://github.com/charles-plessy) - reviewer diff --git a/CITATIONS.md b/CITATIONS.md index b4f80b0..0e9764a 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -12,30 +12,42 @@ - [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) - > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. +> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. + +- [gzrt](https://www.urbanophile.com/arenn/hacking/gzrt/gzrt.html) + +> Renn, A. M. (2013). The gzip Recovery Toolkit [Online]. + +- [Wipertools](https://github.com/mazzalab/fastqwiper) + +> MazzaLab, Tommaso Mazza, & Jose F Oviedo. (2025). mazzalab/fastqwiper: v1.1.5 (v1.1.5). Zenodo. [https://doi.org/10.5281/zenodo.14774039](https://doi.org/10.5281/zenodo.14774039) + +- [BBMap](http://sourceforge.net/projects/bbmap/) + +> Bushnell B. (2024). BBMap [Online] - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) - > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. +> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. ## Software packaging/containerisation tools - [Anaconda](https://anaconda.com) - > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. +> Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. - [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) - > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. +> Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. - [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) - > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. +> da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. - [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) - > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241. +> Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241. - [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) - > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675. +> Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675. diff --git a/LICENSE b/LICENSE index 49b5de7..c75ab34 100644 --- a/LICENSE +++ b/LICENSE @@ -1,6 +1,6 @@ MIT License -Copyright (c) Tommaso Mazza +Copyright (c) The nf-core/fastqrepair team Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal diff --git a/README.md b/README.md index 3dce1ba..77e0f39 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ [![GitHub Actions Linting Status](https://github.com/nf-core/fastqrepair/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/fastqrepair/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/fastqrepair/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX) [![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com) -[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.04.0-23aa62.svg)](https://www.nextflow.io/) +[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A524.04.2-23aa62.svg)](https://www.nextflow.io/) [![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/) [![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/) [![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/) @@ -19,72 +19,65 @@ ## Introduction -**nf-core/fastqrepair** is a bioinformatics pipeline that ... +**nf-core/fastqrepair** is a bioinformatics pipeline that can be used to recover corrupted `FASTQ.gz` files, drop or fix uncompliant reads, remove unpaired reads, and settles reads that became disordered. It takes a `samplesheet` with FASTQ/FASTQ.gz files as input (both single-end and paired-end) and produces clean FASTQ files and QC reports. - +![pipeline_diagram](docs/images/fastqrepair-flow-diagram-v1.0.png) - - - -1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)) -2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/)) +1. Recover reads from corrupted fastq.gz file ([`gzrt`](https://github.com/arenn/gzrt)) +2. Make recovered reads well-formed ([`wipertools`](https://github.com/mazzalab/fastqwiper)) +3. Re-pair reads ([`bbmap/repair.sh`](https://sourceforge.net/projects/bbmap/)) +4. Check QC of recovered reads ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)) +5. Aggregate and report QC ([`MultiQC`](https://github.com/MultiQC/MultiQC)) ## Usage > [!NOTE] > If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data. - +> [!WARNING] +> Rows with different file extensions (e.g., `mysampleA,sample_R1.fastq.gz,sample_R2.fastq`) are not allowed. Now, you can run the pipeline using: - - ```bash nextflow run nf-core/fastqrepair \ - -profile \ + -profile \ --input samplesheet.csv \ --outdir ``` > [!WARNING] -> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; -> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files). +> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files). For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/fastqrepair/usage) and the [parameter documentation](https://nf-co.re/fastqrepair/parameters). ## Pipeline output +This pipeline produces clean and well-formed FASTQ files together with short textual reports of the cleaning actions. + To see the results of an example test run with a full size dataset refer to the [results](https://nf-co.re/fastqrepair/results) tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the [output documentation](https://nf-co.re/fastqrepair/output). ## Credits -nf-core/fastqrepair was originally written by Tommaso Mazza. - -We thank the following people for their extensive assistance in the development of this pipeline: +`nf-core/fastqrepair` was designed and written by [Tommaso Mazza](https://github.com/mazzalab). - + + ## Contributions and Support @@ -97,8 +90,6 @@ For further information or help, don't hesitate to get in touch on the [Slack `# - - An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file. You can cite the `nf-core` publication as follows: diff --git a/assets/email_template.html b/assets/email_template.html index 36acfff..130bc94 100644 --- a/assets/email_template.html +++ b/assets/email_template.html @@ -4,7 +4,7 @@ - + nf-core/fastqrepair Pipeline Report diff --git a/assets/methods_description_template.yml b/assets/methods_description_template.yml index b517d40..7b091e3 100644 --- a/assets/methods_description_template.yml +++ b/assets/methods_description_template.yml @@ -3,7 +3,7 @@ description: "Suggested text and references to use when describing pipeline usag section_name: "nf-core/fastqrepair Methods Description" section_href: "https://github.com/nf-core/fastqrepair" plot_type: "html" -## TODO nf-core: Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline +## Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline ## You inject any metadata in the Nextflow '${workflow}' object data: |

Methods

diff --git a/assets/multiqc_config.yml b/assets/multiqc_config.yml index 29f377c..ed42af5 100644 --- a/assets/multiqc_config.yml +++ b/assets/multiqc_config.yml @@ -1,7 +1,7 @@ report_comment: > - This report has been generated by the nf-core/fastqrepair + This report has been generated by the nf-core/fastqrepair analysis pipeline. For information about how to interpret these results, please see the - documentation. + documentation. report_section_order: "nf-core-fastqrepair-methods-description": order: -1000 diff --git a/assets/nf-core-fastqrepair_logo_light.png b/assets/nf-core-fastqrepair_logo_light.png index f0d28a1..ecb43c5 100644 Binary files a/assets/nf-core-fastqrepair_logo_light.png and b/assets/nf-core-fastqrepair_logo_light.png differ diff --git a/assets/samplesheet.csv b/assets/samplesheet.csv index 5f653ab..0f91d10 100644 --- a/assets/samplesheet.csv +++ b/assets/samplesheet.csv @@ -1,3 +1,4 @@ sample,fastq_1,fastq_2 -SAMPLE_PAIRED_END,/path/to/fastq/files/AEG588A1_S1_L002_R1_001.fastq.gz,/path/to/fastq/files/AEG588A1_S1_L002_R2_001.fastq.gz -SAMPLE_SINGLE_END,/path/to/fastq/files/AEG588A4_S4_L003_R1_001.fastq.gz, +test2,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/fastq/test2_1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/fastq/test2_1_corrupted_10kb.fastq.gz +test_truncated,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/fastq/test_truncated_clean.fastq, +test1,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/fastq/test_1.fastq.gz, diff --git a/assets/schema_input.json b/assets/schema_input.json index 7247405..ef9cee3 100644 --- a/assets/schema_input.json +++ b/assets/schema_input.json @@ -1,6 +1,6 @@ { - "$schema": "http://json-schema.org/draft-07/schema", - "$id": "https://raw.githubusercontent.com/nf-core/fastqrepair/master/assets/schema_input.json", + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://raw.githubusercontent.com/nf-core/fastqrepair/main/assets/schema_input.json", "title": "nf-core/fastqrepair pipeline - params.input schema", "description": "Schema for the file provided with params.input", "type": "array", @@ -17,15 +17,15 @@ "type": "string", "format": "file-path", "exists": true, - "pattern": "^\\S+\\.f(ast)?q\\.gz$", - "errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'" + "pattern": "^\\S+\\.f(ast)?q(\\.gz)?$", + "errorMessage": "FASTQ file for reads 1 must be provided and exist, cannot contain spaces, and must have extension '.fq.gz', '.fastq.gz', '.fq', or '.fastq'" }, "fastq_2": { "type": "string", "format": "file-path", "exists": true, - "pattern": "^\\S+\\.f(ast)?q\\.gz$", - "errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'" + "pattern": "^\\S+\\.f(ast)?q(\\.gz)?$", + "errorMessage": "If provided, a FASTQ file for reads 2 must exist, cannot contain spaces, and must have extension '.fq.gz', '.fastq.gz', '.fq', or '.fastq'" } }, "required": ["sample", "fastq_1"] diff --git a/conf/base.config b/conf/base.config index 3067dd1..36396e2 100644 --- a/conf/base.config +++ b/conf/base.config @@ -11,46 +11,46 @@ process { // TODO nf-core: Check the defaults for all processes - cpus = { check_max( 1 * task.attempt, 'cpus' ) } - memory = { check_max( 6.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { 1 * task.attempt } + memory = { 6.GB * task.attempt } + time = { 4.h * task.attempt } errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' } maxRetries = 1 maxErrors = '-1' // Process-specific resource requirements - // NOTE - Please try and re-use the labels below as much as possible. + // NOTE - Please try and reuse the labels below as much as possible. // These labels are used and recognised by default in DSL2 files hosted on nf-core/modules. // If possible, it would be nice to keep the same label naming convention when // adding in your local modules too. - // TODO nf-core: Customise requirements for specific processes. + // (Eventually) Customise requirements for specific processes. // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors withLabel:process_single { - cpus = { check_max( 1 , 'cpus' ) } - memory = { check_max( 6.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { 1 } + memory = { 6.GB * task.attempt } + time = { 4.h * task.attempt } } withLabel:process_low { - cpus = { check_max( 2 * task.attempt, 'cpus' ) } - memory = { check_max( 12.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { 2 * task.attempt } + memory = { 12.GB * task.attempt } + time = { 4.h * task.attempt } } withLabel:process_medium { - cpus = { check_max( 6 * task.attempt, 'cpus' ) } - memory = { check_max( 36.GB * task.attempt, 'memory' ) } - time = { check_max( 8.h * task.attempt, 'time' ) } + cpus = { 6 * task.attempt } + memory = { 36.GB * task.attempt } + time = { 8.h * task.attempt } } withLabel:process_high { - cpus = { check_max( 12 * task.attempt, 'cpus' ) } - memory = { check_max( 72.GB * task.attempt, 'memory' ) } - time = { check_max( 16.h * task.attempt, 'time' ) } + cpus = { 12 * task.attempt } + memory = { 72.GB * task.attempt } + time = { 16.h * task.attempt } } withLabel:process_long { - time = { check_max( 20.h * task.attempt, 'time' ) } + time = { 20.h * task.attempt } } withLabel:process_high_memory { - memory = { check_max( 200.GB * task.attempt, 'memory' ) } + memory = { 200.GB * task.attempt } } withLabel:error_ignore { errorStrategy = 'ignore' diff --git a/conf/modules.config b/conf/modules.config index d203d2b..e92751d 100644 --- a/conf/modules.config +++ b/conf/modules.config @@ -15,20 +15,97 @@ process { publishDir = [ path: { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" }, mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + saveAs: { filename -> filename }, + enabled: params.publish_all_tools ] - withName: FASTQC { + withName: 'WIPERTOOLS_FASTQSCATTER' { + ext.args = '--os unix' + } + + withName: 'WIPERTOOLS_FASTQWIPER' { + ext.args = "--alphabet ${params.alphabet}" + } + + withName: 'WIPERTOOLS_REPORTGATHER' { + publishDir = [ + path: { "${params.outdir}/repaired" }, + mode: params.publish_dir_mode, + saveAs: { filename -> + if(!filename.equals('versions.yml')) { + if (filename.replace("${meta.sample_id}", "").contains("_1_")){ + "${meta.sample_id}_1.report" + } else if (filename.replace("${meta.sample_id}", "").contains("_2_")){ + "${meta.sample_id}_2.report" + } else { + "${meta.sample_id}.report" + } + } else { + null + } + } + ] + } + + withName: 'WIPERTOOLS_FASTQGATHER' { + publishDir = [ + path: { "${params.outdir}/repaired" }, + mode: params.publish_dir_mode, + saveAs: { filename -> + if(!filename.equals('versions.yml') && (meta.single_end || params.skip_bbmap_repair)) { + if (filename.replace("${meta.sample_id}", "").contains("_1_")){ + "${meta.sample_id}_1.fastq.gz" + } else if (filename.replace("${meta.sample_id}", "").contains("_2_")){ + "${meta.sample_id}_2.fastq.gz" + } else { + "${meta.sample_id}.fastq.gz" + } + } else { + null + } + } + ] + } + + withName: 'BBMAP_REPAIR' { + ext.args = "qin=${params.qin}" + + publishDir = [ + path: { "${params.outdir}/repaired" }, + mode: params.publish_dir_mode, + saveAs: { filename -> + if(!filename.equals('versions.yml')) { + System.out.println("filename: ${filename}") + if (filename.replace("${meta.id}", "").contains("_1_")){ + "${meta.id}_1.fastq.gz" + } else if (filename.replace("${meta.id}", "").contains("_2_")){ + "${meta.id}_2.fastq.gz" + } else { + "${meta.id}_singleton.fastq.gz" + } + } else { + null + } + } + ] + } + + withName: 'FASTQC' { + // memory = 8.GB // to be included during development and commented out before release ext.args = '--quiet' + publishDir = [ + path: { "${params.outdir}/QC/fastqc" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] } withName: 'MULTIQC' { ext.args = { params.multiqc_title ? "--title \"$params.multiqc_title\"" : '' } publishDir = [ - path: { "${params.outdir}/multiqc" }, + path: { "${params.outdir}/QC/multiqc" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename } ] } - } diff --git a/conf/test.config b/conf/test.config index a71972c..0eb8449 100644 --- a/conf/test.config +++ b/conf/test.config @@ -10,19 +10,21 @@ ---------------------------------------------------------------------------------------- */ +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} + params { config_profile_name = 'Test profile' config_profile_description = 'Minimal test dataset to check pipeline function' - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' + publish_all_tools = false + num_splits = 2 // Input data - // TODO nf-core: Specify the paths to your test data on nf-core/test-datasets - // TODO nf-core: Give any required params for the test so that command line flags are not needed - input = params.pipelines_testdata_base_path + 'viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv' - - + input = params.pipelines_testdata_base_path + 'fastqrepair/testdata/samplesheet_30reads.csv' } diff --git a/conf/test_full.config b/conf/test_full.config index f159356..50d1c5d 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -11,14 +11,12 @@ */ params { - config_profile_name = 'Full test profile' - config_profile_description = 'Full test dataset to check pipeline function' + config_profile_name = 'Test with 30reads dataset' + config_profile_description = 'Full test with 30reads dataset and two empty FASTQ files' - // Input data for full size test - // TODO nf-core: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA) - // TODO nf-core: Give any required params for the test so that command line flags are not needed - input = params.pipelines_testdata_base_path + 'viralrecon/samplesheet/samplesheet_full_illumina_amplicon.csv' + publish_all_tools = false + num_splits = 2 - // Fasta references - fasta = params.pipelines_testdata_base_path + 'viralrecon/genome/NC_045512.2/GCF_009858895.2_ASM985889v3_genomic.200409.fna.gz' + // Input data for full size test + input = params.pipelines_testdata_base_path + 'fastqrepair/testdata/samplesheet_30reads.csv' } diff --git a/docs/images/fastqrepair-flow-diagram-v1.0.png b/docs/images/fastqrepair-flow-diagram-v1.0.png new file mode 100644 index 0000000..d3fc836 Binary files /dev/null and b/docs/images/fastqrepair-flow-diagram-v1.0.png differ diff --git a/docs/images/fastqrepair-flow-diagram-v1.0.svg b/docs/images/fastqrepair-flow-diagram-v1.0.svg new file mode 100755 index 0000000..9d1bc34 --- /dev/null +++ b/docs/images/fastqrepair-flow-diagram-v1.0.svg @@ -0,0 +1,1237 @@ + + + +1.0.0gzrtscattergatherfastqwiperbbmap/repairfastqcmultiqcR1 (chunk 1)R1 (chunk 1)R1 (chunk n)R1 (chunk n)R2 (chunk 1)R2 (chunk n)recover readssplit readsjoin cleaned readsjoin reportsclean readsre-pair readscheck QCreport QCLegendpaired-end readsFile inputFile outputOptionalReads partitioningbased on the #coressingle-end readstxtreporttxtreporttxtreporthtmlreportfastqrepairedreadstxtreporttxtsamplesheetfastqcorruptedreads diff --git a/docs/images/mqc_fastqc_adapter.png b/docs/images/mqc_fastqc_adapter.png deleted file mode 100755 index 361d0e4..0000000 Binary files a/docs/images/mqc_fastqc_adapter.png and /dev/null differ diff --git a/docs/images/mqc_fastqc_counts.png b/docs/images/mqc_fastqc_counts.png deleted file mode 100755 index cb39ebb..0000000 Binary files a/docs/images/mqc_fastqc_counts.png and /dev/null differ diff --git a/docs/images/mqc_fastqc_quality.png b/docs/images/mqc_fastqc_quality.png deleted file mode 100755 index a4b89bf..0000000 Binary files a/docs/images/mqc_fastqc_quality.png and /dev/null differ diff --git a/docs/images/nf-core-fastqrepair_logo_dark.png b/docs/images/nf-core-fastqrepair_logo_dark.png index 893a206..ebb0aa4 100644 Binary files a/docs/images/nf-core-fastqrepair_logo_dark.png and b/docs/images/nf-core-fastqrepair_logo_dark.png differ diff --git a/docs/images/nf-core-fastqrepair_logo_light.png b/docs/images/nf-core-fastqrepair_logo_light.png index 3523d7e..8253ec1 100644 Binary files a/docs/images/nf-core-fastqrepair_logo_light.png and b/docs/images/nf-core-fastqrepair_logo_light.png differ diff --git a/docs/output.md b/docs/output.md index b54a9f7..d64bb1d 100644 --- a/docs/output.md +++ b/docs/output.md @@ -2,22 +2,36 @@ ## Introduction -This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline. +This document describes the output produced by the pipeline. -The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory. +## Pipeline overview - +The nf-core/fastqrepair pipeline is built using [Nextflow](https://www.nextflow.io/). Output files are gathered into the `repaired` subfolder of the output folder set up using the `--outdir` command line parameter. -## Pipeline overview +``` + + - pipeline_info/ + - QC/ + - fastqc/ + - multiqc/ + - repaired/ +``` -The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: +
+Folders content -- [FastQC](#fastqc) - Raw read QC -- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline -- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution +- `pipeline_info` contains report metrics generated during the [workflow execution](#pipeline-information) +- `QC` contains [FastQC](#fastqc) QC analysis results and [MultiQC](#multiqc) reports +- `repaired` contains repaired fastq files and relative quality reports + - `mysampleA_R1.fastq.gz` (and `mysampleA_R2.fastq.gz`) are the repaired fastq files + - `mysampleA_R1.report` (and `mysampleA_R2.report`) are the summaries of the cleaning task + +
### FastQC +[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/). +
Output files @@ -27,20 +41,10 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
-[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/). - -![MultiQC - FastQC sequence counts plot](images/mqc_fastqc_counts.png) - -![MultiQC - FastQC mean quality scores plot](images/mqc_fastqc_quality.png) - -![MultiQC - FastQC adapter content plot](images/mqc_fastqc_adapter.png) - -:::note -The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality. -::: - ### MultiQC +[MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory. +
Output files @@ -51,8 +55,6 @@ The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They m
-[MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory. - Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see . ### Pipeline information @@ -61,11 +63,9 @@ Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQ Output files - `pipeline_info/` - - Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`. + - Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.html`. - Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.yml`. The `pipeline_report*` files will only be present if the `--email` / `--email_on_fail` parameter's are used when running the pipeline. - Reformatted samplesheet files used as input to the pipeline: `samplesheet.valid.csv`. - Parameters used by the pipeline run: `params.json`. - -[Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage. diff --git a/docs/usage.md b/docs/usage.md index df2f5d5..36107ee 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -6,58 +6,46 @@ ## Introduction - - ## Samplesheet input -You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below. +You will need to create a samplesheet that will be provided to `fastqrepair` as a `--input` parameter: ```bash --input '[path to samplesheet file]' ``` -### Multiple runs of the same sample - -The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes: - -```csv title="samplesheet.csv" -sample,fastq_1,fastq_2 -CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz -CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz -CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz -``` +The samplesheet will contain information about the samples you would like to analyse before running the pipeline. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below. ### Full samplesheet The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 3 columns to match those defined in the table below. -A final samplesheet file consisting of both single- and paired-end data may look something like the one below. This is for 6 samples, where `TREATMENT_REP3` has been sequenced twice. +A final samplesheet file consisting of both single- and paired-end data may look something like the one below. This is for 3 samples. ```csv title="samplesheet.csv" sample,fastq_1,fastq_2 -CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz -CONTROL_REP2,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz -CONTROL_REP3,AEG588A3_S3_L002_R1_001.fastq.gz,AEG588A3_S3_L002_R2_001.fastq.gz -TREATMENT_REP1,AEG588A4_S4_L003_R1_001.fastq.gz, -TREATMENT_REP2,AEG588A5_S5_L003_R1_001.fastq.gz, -TREATMENT_REP3,AEG588A6_S6_L003_R1_001.fastq.gz, -TREATMENT_REP3,AEG588A6_S6_L004_R1_001.fastq.gz, +mysampleA,sample_1_R1.fastq.gz,sample_1_R2.fastq.gz +mysampleB,sample_2_R1.fastq,sample_2_R2.fastq +mysampleC,sample_3_R1.fq.gz, ``` -| Column | Description | -| --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `sample` | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). | -| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | -| `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | +| Column | Description | +| --------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | +| `sample` | Custom sample name. Spaces in sample names are automatically converted to underscores (`_`). | +| `fastq_1` | Full path to FASTQ file for short reads R1. Files can be gzipped or not, with the extensions: `.fastq.gz`, `.fastq`, `.fq.gz`, or `.fq`. | +| `fastq_2` | Full path to FASTQ file for short reads R2 (optional). Files can be gzipped or not, with the extensions: `.fastq.gz`, `.fastq`, `.fq.gz`, or `.fq`. | An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline. +> [!WARNING] +> Different file extensions are not allowed _in the same row_, i.e., in the same paired-end sample (e.g., `mysampleA,sample_R1.fastq.gz,sample_R2.fastq`) + ## Running the pipeline The typical command for running the pipeline is as follows: ```bash -nextflow run nf-core/fastqrepair --input ./samplesheet.csv --outdir ./results --genome GRCh37 -profile docker +nextflow run nf-core/fastqrepair --input ./samplesheet.csv --outdir ./results -profile docker ``` This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles. @@ -75,9 +63,8 @@ If you wish to repeatedly use the same parameters for multiple runs, rather than Pipeline settings can be provided in a `yaml` or `json` file via `-params-file `. -:::warning -Do not use `-c ` to specify parameters as this will result in errors. Custom config files specified with `-c` must only be used for [tuning process resource specifications](https://nf-co.re/docs/usage/configuration#tuning-workflow-resources), other infrastructural tweaks (such as output directories), or module arguments (args). -::: +> [!WARNING] +> Do not use `-c ` to specify parameters as this will result in errors. Custom config files specified with `-c` must only be used for [tuning process resource specifications](https://nf-co.re/docs/usage/configuration#tuning-workflow-resources), other infrastructural tweaks (such as output directories), or module arguments (args). The above pipeline run specified with a params file in yaml format: @@ -85,15 +72,15 @@ The above pipeline run specified with a params file in yaml format: nextflow run nf-core/fastqrepair -profile docker -params-file params.yaml ``` -with `params.yaml` containing: +with: -```yaml -input: './samplesheet.csv' -outdir: './results/' -genome: 'GRCh37' -<...> +```yaml title="params.yaml" +input: "./samplesheet.csv" +outdir: "./results/" ``` +Optional parameters are described in the [Parameters](https://nf-co.re/fastqrepair/parameters/) tab. + You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch). ### Updating the pipeline @@ -106,23 +93,21 @@ nextflow pull nf-core/fastqrepair ### Reproducibility -It is a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. +It is a good idea to specify the pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. -First, go to the [nf-core/fastqrepair releases page](https://github.com/nf-core/fastqrepair/releases) and find the latest pipeline version - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`. Of course, you can switch to another version by changing the number after the `-r` flag. +First, go to the [nf-core/fastqrepair releases page](https://github.com/nf-core/fastqrepair/releases) and find the latest pipeline version - numeric only (eg. `1.0.0`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.0.0`. Of course, you can switch to another version by changing the number after the `-r` flag. This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. For example, at the bottom of the MultiQC reports. -To further assist in reproducbility, you can use share and re-use [parameter files](#running-the-pipeline) to repeat pipeline runs with the same settings without having to write out a command with every single parameter. +To further assist in reproducibility, you can use share and reuse [parameter files](#running-the-pipeline) to repeat pipeline runs with the same settings without having to write out a command with every single parameter. -:::tip -If you wish to share such profile (such as upload as supplementary material for academic publications), make sure to NOT include cluster specific paths to files, nor institutional specific profiles. -::: +> [!TIP] +> If you wish to share such profile (such as upload as supplementary material for academic publications), make sure to NOT include cluster specific paths to files, nor institutional specific profiles. ## Core Nextflow arguments -:::note -These options are part of Nextflow and use a _single_ hyphen (pipeline parameters use a double-hyphen). -::: +> [!NOTE] +> These options are part of Nextflow and use a _single_ hyphen (pipeline parameters use a double-hyphen) ### `-profile` @@ -130,16 +115,15 @@ Use this parameter to choose a configuration profile. Profiles can give configur Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud, Apptainer, Conda) - see below. -:::info -We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported. -::: +> [!IMPORTANT] +> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported. -The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation). +The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to check if your system is supported, please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation). Note that multiple profiles can be loaded, for example: `-profile test,docker` - the order of arguments is important! They are loaded in sequence, so later profiles can overwrite earlier profiles. -If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended, since it can lead to different results on different machines dependent on the computer enviroment. +If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended, since it can lead to different results on different machines dependent on the computer environment. - `test` - A profile with a complete configuration for automated testing @@ -175,13 +159,13 @@ Specify the path to a specific config file (this is a core Nextflow command). Se ### Resource requests -Whilst the default requirements set within the pipeline will hopefully work for most people and with most input data, you may find that you want to customise the compute resources that the pipeline requests. Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with any of the error codes specified [here](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L18) it will automatically be resubmitted with higher requests (2 x original, then 3 x original). If it still fails after the third attempt then the pipeline execution is stopped. +Whilst the default requirements set within the pipeline will hopefully work for most people and with most input data, you may find that you want to customise the compute resources that the pipeline requests. Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the pipeline steps, if the job exits with any of the error codes specified [here](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L18) it will automatically be resubmitted with higher resources request (2 x original, then 3 x original). If it still fails after the third attempt then the pipeline execution is stopped. To change the resource requests, please see the [max resources](https://nf-co.re/docs/usage/configuration#max-resources) and [tuning workflow resources](https://nf-co.re/docs/usage/configuration#tuning-workflow-resources) section of the nf-core website. ### Custom Containers -In some cases you may wish to change which container or conda environment a step of the pipeline uses for a particular tool. By default nf-core pipelines use containers and software from the [biocontainers](https://biocontainers.pro/) or [bioconda](https://bioconda.github.io/) projects. However in some cases the pipeline specified version maybe out of date. +In some cases, you may wish to change the container or conda environment used by a pipeline steps for a particular tool. By default, nf-core pipelines use containers and software from the [biocontainers](https://biocontainers.pro/) or [bioconda](https://bioconda.github.io/) projects. However, in some cases the pipeline specified version maybe out of date. To use a different container from the default container or conda environment specified in a pipeline, please see the [updating tool versions](https://nf-co.re/docs/usage/configuration#updating-tool-versions) section of the nf-core website. @@ -199,14 +183,6 @@ See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack) on the [`#configs` channel](https://nfcore.slack.com/channels/configs). -## Azure Resource Requests - -To be used with the `azurebatch` profile by specifying the `-profile azurebatch`. -We recommend providing a compute `params.vm_type` of `Standard_D16_v3` VMs by default but these options can be changed if required. - -Note that the choice of VM size depends on your quota and the overall workload during the analysis. -For a thorough list, please refer the [Azure Sizes for virtual machines in Azure](https://docs.microsoft.com/en-us/azure/virtual-machines/sizes). - ## Running in the background Nextflow handles job submissions and supervises the running jobs. The Nextflow process must run until the pipeline is finished. diff --git a/main.nf b/main.nf index b719991..ea98c13 100644 --- a/main.nf +++ b/main.nf @@ -9,18 +9,15 @@ ---------------------------------------------------------------------------------------- */ -nextflow.enable.dsl = 2 - /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ IMPORT FUNCTIONS / MODULES / SUBWORKFLOWS / WORKFLOWS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -include { FASTQREPAIR } from './workflows/fastqrepair' +include { FASTQREPAIR } from './workflows/fastqrepair' include { PIPELINE_INITIALISATION } from './subworkflows/local/utils_nfcore_fastqrepair_pipeline' include { PIPELINE_COMPLETION } from './subworkflows/local/utils_nfcore_fastqrepair_pipeline' - /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NAMED WORKFLOWS FOR PIPELINE @@ -43,10 +40,8 @@ workflow NFCORE_FASTQREPAIR { FASTQREPAIR ( samplesheet ) - emit: multiqc_report = FASTQREPAIR.out.multiqc_report // channel: /path/to/multiqc_report.html - } /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -57,13 +52,11 @@ workflow NFCORE_FASTQREPAIR { workflow { main: - // // SUBWORKFLOW: Run initialisation tasks // PIPELINE_INITIALISATION ( params.version, - params.help, params.validate_params, params.monochrome_logs, args, @@ -77,7 +70,6 @@ workflow { NFCORE_FASTQREPAIR ( PIPELINE_INITIALISATION.out.samplesheet ) - // // SUBWORKFLOW: Run completion tasks // diff --git a/modules.json b/modules.json index c1007f8..e921d1c 100644 --- a/modules.json +++ b/modules.json @@ -5,14 +5,44 @@ "https://github.com/nf-core/modules.git": { "modules": { "nf-core": { + "bbmap/repair": { + "branch": "master", + "git_sha": "4c81a35b9f817661af684a50dec2a5a3eb297df2", + "installed_by": ["modules"] + }, "fastqc": { "branch": "master", - "git_sha": "285a50500f9e02578d90b3ce6382ea3c30216acd", + "git_sha": "08108058ea36a63f141c25c4e75f9f872a5b2296", + "installed_by": ["modules"] + }, + "gzrt": { + "branch": "master", + "git_sha": "56f433b35062e2ca97f12d490fcc818107c33521", "installed_by": ["modules"] }, "multiqc": { "branch": "master", - "git_sha": "b7ebe95761cd389603f9cc0e0dc384c0f663815a", + "git_sha": "f0719ae309075ae4a291533883847c3f7c441dad", + "installed_by": ["modules"] + }, + "wipertools/fastqgather": { + "branch": "master", + "git_sha": "256a6dd005be691eb34487e4d4c56368b2995d5a", + "installed_by": ["modules"] + }, + "wipertools/fastqscatter": { + "branch": "master", + "git_sha": "256a6dd005be691eb34487e4d4c56368b2995d5a", + "installed_by": ["modules"] + }, + "wipertools/fastqwiper": { + "branch": "master", + "git_sha": "256a6dd005be691eb34487e4d4c56368b2995d5a", + "installed_by": ["modules"] + }, + "wipertools/reportgather": { + "branch": "master", + "git_sha": "256a6dd005be691eb34487e4d4c56368b2995d5a", "installed_by": ["modules"] } } @@ -21,17 +51,17 @@ "nf-core": { "utils_nextflow_pipeline": { "branch": "master", - "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa", + "git_sha": "c2b22d85f30a706a3073387f30380704fcae013b", "installed_by": ["subworkflows"] }, "utils_nfcore_pipeline": { "branch": "master", - "git_sha": "92de218a329bfc9a9033116eb5f65fd270e72ba3", + "git_sha": "51ae5406a030d4da1e49e4dab49756844fdd6c7a", "installed_by": ["subworkflows"] }, - "utils_nfvalidation_plugin": { + "utils_nfschema_plugin": { "branch": "master", - "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa", + "git_sha": "2fd2cd6d0e7b273747f32e465fdc6bcc3ae0814e", "installed_by": ["subworkflows"] } } diff --git a/modules/nf-core/bbmap/repair/environment.yml b/modules/nf-core/bbmap/repair/environment.yml new file mode 100644 index 0000000..4e65bfe --- /dev/null +++ b/modules/nf-core/bbmap/repair/environment.yml @@ -0,0 +1,7 @@ +--- +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json +channels: + - conda-forge + - bioconda +dependencies: + - "bioconda::bbmap=39.13" diff --git a/modules/nf-core/bbmap/repair/main.nf b/modules/nf-core/bbmap/repair/main.nf new file mode 100644 index 0000000..ec868fa --- /dev/null +++ b/modules/nf-core/bbmap/repair/main.nf @@ -0,0 +1,57 @@ +process BBMAP_REPAIR { + tag "$meta.id" + label 'process_single' + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bbmap:39.13--he5f24ec_1': + 'biocontainers/bbmap:39.13--he5f24ec_1' }" + + input: + tuple val(meta), path(reads) + val(interleave) + + output: + tuple val(meta), path("*_repaired.fastq.gz") , emit: repaired + tuple val(meta), path("${prefix}_singleton.fastq.gz"), emit: singleton + path "versions.yml" , emit: versions + path "*.log" , emit: log + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + prefix = task.ext.prefix ?: "${meta.id}" + in_reads = ( interleave ) ? "in=${reads[0]}" : "in=${reads[0]} in2=${reads[1]}" + out_reads = ( interleave ) ? "out=${prefix}_repaired.fastq.gz outs=${prefix}_singleton.fastq.gz" + : "out=${prefix}_1_repaired.fastq.gz out2=${prefix}_2_repaired.fastq.gz outs=${prefix}_singleton.fastq.gz" + """ + maxmem=\$(echo \"$task.memory\"| sed 's/ GB/g/g') + repair.sh \\ + -Xmx\$maxmem \\ + $in_reads \\ + $out_reads \\ + threads=${task.cpus} + ${args} \\ + &> ${prefix}.repair.sh.log + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bbmap: \$(bbversion.sh | grep -v "Duplicate cpuset") + END_VERSIONS + """ + + stub: + prefix = task.ext.prefix ?: "${meta.id}" + """ + echo "" | gzip > ${prefix}_1_repaired.fastq.gz + echo "" | gzip > ${prefix}_2_repaired.fastq.gz + echo "" | gzip > ${prefix}_singleton.fastq.gz + touch ${prefix}.repair.sh.log + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bbmap: \$(bbversion.sh | grep -v "Duplicate cpuset") + END_VERSIONS + """ +} diff --git a/modules/nf-core/bbmap/repair/meta.yml b/modules/nf-core/bbmap/repair/meta.yml new file mode 100644 index 0000000..9190cd9 --- /dev/null +++ b/modules/nf-core/bbmap/repair/meta.yml @@ -0,0 +1,67 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json +name: "bbmap_repair" +description: Re-pairs reads that became disordered or had some mates eliminated. +keywords: + - paired reads re-pairing + - fastq + - preprocessing +tools: + - repair: + description: Repair.sh is a tool that re-pairs reads that became disordered or had some mates eliminated + tools. + homepage: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/ + documentation: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/ + licence: ["UC-LBL license (see package)"] + identifier: biotools:bbmap + +input: + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1', single_end:false ]` + - reads: + type: file + description: | + List of input paired end fastq files + pattern: "*.{fastq,fq}.gz" + - - interleave: + type: boolean + description: | + Indicates whether the input paired reads are interleaved or not +output: + - repaired: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1', single_end:false ]` + - "*_repaired.fastq.gz": + type: file + description: re-paired reads + pattern: "*_repaired.fastq.gz" + - singleton: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1', single_end:false ]` + - "${prefix}_singleton.fastq.gz": + type: file + description: singleton reads + pattern: "*singleton.fastq.gz" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" + - log: + - "*.log": + type: file + description: log file containing stdout and stderr from repair.sh + pattern: "*.log" +authors: + - "@mazzalab" +maintainers: + - "@mazzalab" + - "@tm4zza" diff --git a/modules/nf-core/bbmap/repair/tests/main.nf.test b/modules/nf-core/bbmap/repair/tests/main.nf.test new file mode 100644 index 0000000..ea6ff0f --- /dev/null +++ b/modules/nf-core/bbmap/repair/tests/main.nf.test @@ -0,0 +1,115 @@ +nextflow_process { + + name "Test Process BBMAP_REPAIR" + script "../main.nf" + process "BBMAP_REPAIR" + + tag "modules" + tag "modules_nfcore" + tag "bbmap" + tag "bbmap/repair" + + test("sarscov2_illumina_paired - fastq_gz") { + config "./nextflow.config" + + when { + params { + module_args = 'qin=33' + } + process { + """ + input[0] = [ + [ id:'test', single_end:false ], // meta map + [file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)] + ] + input[1] = false + """ + } + } + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out.versions).match() } + ) + } + } + + test("sarscov2_illumina_interleaved - fastq_gz") { + config "./nextflow.config" + + when { + params { + module_args = 'qin=33' + } + process { + """ + input[0] = [ + [ id:'test', single_end:false ], // meta map + [file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_interleaved.fastq.gz', checkIfExists: true)] + ] + input[1] = true + """ + } + } + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out.versions).match() } + ) + } + } + + test("sarscov2_illumina_paired - fastq_gz - stub") { + config "./nextflow.config" + options "-stub" + + when { + params { + module_args = 'qin=33' + } + process { + """ + input[0] = [ + [ id:'test', single_end:false ], // meta map + [file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)] + ] + input[1] = false + """ + } + } + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("sarscov2_illumina_interleaved - fastq_gz - stub") { + config "./nextflow.config" + options "-stub" + + when { + params { + module_args = 'qin=33' + } + process { + """ + input[0] = [ + [ id:'test', single_end:false ], // meta map + [file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_interleaved.fastq.gz', checkIfExists: true)] + ] + input[1] = true + """ + } + } + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } +} diff --git a/modules/nf-core/bbmap/repair/tests/main.nf.test.snap b/modules/nf-core/bbmap/repair/tests/main.nf.test.snap new file mode 100644 index 0000000..d10edaf --- /dev/null +++ b/modules/nf-core/bbmap/repair/tests/main.nf.test.snap @@ -0,0 +1,156 @@ +{ + "sarscov2_illumina_paired - fastq_gz": { + "content": [ + [ + "versions.yml:md5,71de954cc3b49c64d7bd86ed82d3e892" + ] + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.3" + }, + "timestamp": "2025-01-18T18:32:09.249624" + }, + "sarscov2_illumina_interleaved - fastq_gz - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test_1_repaired.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test_2_repaired.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test_singleton.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "2": [ + "versions.yml:md5,71de954cc3b49c64d7bd86ed82d3e892" + ], + "3": [ + "test.repair.sh.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ], + "log": [ + "test.repair.sh.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ], + "repaired": [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test_1_repaired.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test_2_repaired.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "singleton": [ + [ + { + "id": "test", + "single_end": false + }, + "test_singleton.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "versions": [ + "versions.yml:md5,71de954cc3b49c64d7bd86ed82d3e892" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.3" + }, + "timestamp": "2025-01-18T18:32:38.864295" + }, + "sarscov2_illumina_interleaved - fastq_gz": { + "content": [ + [ + "versions.yml:md5,71de954cc3b49c64d7bd86ed82d3e892" + ] + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.3" + }, + "timestamp": "2025-01-18T18:32:18.562698" + }, + "sarscov2_illumina_paired - fastq_gz - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test_1_repaired.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test_2_repaired.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test_singleton.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "2": [ + "versions.yml:md5,71de954cc3b49c64d7bd86ed82d3e892" + ], + "3": [ + "test.repair.sh.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ], + "log": [ + "test.repair.sh.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ], + "repaired": [ + [ + { + "id": "test", + "single_end": false + }, + [ + "test_1_repaired.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test_2_repaired.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "singleton": [ + [ + { + "id": "test", + "single_end": false + }, + "test_singleton.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "versions": [ + "versions.yml:md5,71de954cc3b49c64d7bd86ed82d3e892" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.3" + }, + "timestamp": "2025-01-18T18:32:28.275703" + } +} \ No newline at end of file diff --git a/modules/nf-core/bbmap/repair/tests/nextflow.config b/modules/nf-core/bbmap/repair/tests/nextflow.config new file mode 100644 index 0000000..6b9f0bf --- /dev/null +++ b/modules/nf-core/bbmap/repair/tests/nextflow.config @@ -0,0 +1,5 @@ +process { + withName: BBMAP_REPAIR { + ext.args = params.module_args + } +} diff --git a/modules/nf-core/bbmap/repair/tests/tags.yml b/modules/nf-core/bbmap/repair/tests/tags.yml new file mode 100644 index 0000000..a6e491d --- /dev/null +++ b/modules/nf-core/bbmap/repair/tests/tags.yml @@ -0,0 +1,2 @@ +bbmap/repair: + - "modules/nf-core/bbmap/repair/**" diff --git a/modules/nf-core/fastqc/environment.yml b/modules/nf-core/fastqc/environment.yml index 1787b38..691d4c7 100644 --- a/modules/nf-core/fastqc/environment.yml +++ b/modules/nf-core/fastqc/environment.yml @@ -1,7 +1,5 @@ -name: fastqc channels: - conda-forge - bioconda - - defaults dependencies: - bioconda::fastqc=0.12.1 diff --git a/modules/nf-core/fastqc/main.nf b/modules/nf-core/fastqc/main.nf index d79f1c8..033f415 100644 --- a/modules/nf-core/fastqc/main.nf +++ b/modules/nf-core/fastqc/main.nf @@ -1,5 +1,5 @@ process FASTQC { - tag "$meta.id" + tag "${meta.id}" label 'process_medium' conda "${moduleDir}/environment.yml" @@ -19,27 +19,30 @@ process FASTQC { task.ext.when == null || task.ext.when script: - def args = task.ext.args ?: '' - def prefix = task.ext.prefix ?: "${meta.id}" + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" // Make list of old name and new name pairs to use for renaming in the bash while loop def old_new_pairs = reads instanceof Path || reads.size() == 1 ? [[ reads, "${prefix}.${reads.extension}" ]] : reads.withIndex().collect { entry, index -> [ entry, "${prefix}_${index + 1}.${entry.extension}" ] } - def rename_to = old_new_pairs*.join(' ').join(' ') - def renamed_files = old_new_pairs.collect{ old_name, new_name -> new_name }.join(' ') + def rename_to = old_new_pairs*.join(' ').join(' ') + def renamed_files = old_new_pairs.collect{ _old_name, new_name -> new_name }.join(' ') - def memory_in_mb = MemoryUnit.of("${task.memory}").toUnit('MB') + // The total amount of allocated RAM by FastQC is equal to the number of threads defined (--threads) time the amount of RAM defined (--memory) + // https://github.com/s-andrews/FastQC/blob/1faeea0412093224d7f6a07f777fad60a5650795/fastqc#L211-L222 + // Dividing the task.memory by task.cpu allows to stick to requested amount of RAM in the label + def memory_in_mb = task.memory ? task.memory.toUnit('MB').toFloat() / task.cpus : null // FastQC memory value allowed range (100 - 10000) def fastqc_memory = memory_in_mb > 10000 ? 10000 : (memory_in_mb < 100 ? 100 : memory_in_mb) """ - printf "%s %s\\n" $rename_to | while read old_name new_name; do + printf "%s %s\\n" ${rename_to} | while read old_name new_name; do [ -f "\${new_name}" ] || ln -s \$old_name \$new_name done fastqc \\ - $args \\ - --threads $task.cpus \\ - --memory $fastqc_memory \\ - $renamed_files + ${args} \\ + --threads ${task.cpus} \\ + --memory ${fastqc_memory} \\ + ${renamed_files} cat <<-END_VERSIONS > versions.yml "${task.process}": diff --git a/modules/nf-core/fastqc/meta.yml b/modules/nf-core/fastqc/meta.yml index ee5507e..2b2e62b 100644 --- a/modules/nf-core/fastqc/meta.yml +++ b/modules/nf-core/fastqc/meta.yml @@ -11,40 +11,50 @@ tools: FastQC gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%A/C/G/T). + You get information about adapter contamination and other overrepresented sequences. homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/ licence: ["GPL-2.0-only"] + identifier: biotools:fastqc input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - reads: - type: file - description: | - List of input FastQ files of size 1 and 2 for single-end and paired-end data, - respectively. + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FastQ files of size 1 and 2 for single-end and paired-end data, + respectively. output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - html: - type: file - description: FastQC report - pattern: "*_{fastqc.html}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.html": + type: file + description: FastQC report + pattern: "*_{fastqc.html}" - zip: - type: file - description: FastQC report archive - pattern: "*_{fastqc.zip}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.zip": + type: file + description: FastQC report archive + pattern: "*_{fastqc.zip}" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@drpatelh" - "@grst" diff --git a/modules/nf-core/fastqc/tests/main.nf.test b/modules/nf-core/fastqc/tests/main.nf.test index 70edae4..e9d79a0 100644 --- a/modules/nf-core/fastqc/tests/main.nf.test +++ b/modules/nf-core/fastqc/tests/main.nf.test @@ -23,17 +23,14 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - // NOTE The report contains the date inside it, which means that the md5sum is stable per day, but not longer than that. So you can't md5sum it. - // looks like this:
Mon 2 Oct 2023
test.gz
- // https://github.com/nf-core/modules/pull/3903#issuecomment-1743620039 - - { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, - { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, - { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_single") } + { assert process.success }, + // NOTE The report contains the date inside it, which means that the md5sum is stable per day, but not longer than that. So you can't md5sum it. + // looks like this:
Mon 2 Oct 2023
test.gz
+ // https://github.com/nf-core/modules/pull/3903#issuecomment-1743620039 + { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, + { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, + { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } @@ -54,16 +51,14 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" }, - { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" }, - { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" }, - { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" }, - { assert path(process.out.html[0][1][0]).text.contains("File typeConventional base calls") }, - { assert path(process.out.html[0][1][1]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_paired") } + { assert process.success }, + { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" }, + { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" }, + { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" }, + { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" }, + { assert path(process.out.html[0][1][0]).text.contains("File typeConventional base calls") }, + { assert path(process.out.html[0][1][1]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } @@ -83,13 +78,11 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, - { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, - { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_interleaved") } + { assert process.success }, + { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, + { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, + { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } @@ -109,13 +102,11 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, - { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, - { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_bam") } + { assert process.success }, + { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, + { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, + { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } @@ -138,22 +129,20 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" }, - { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" }, - { assert process.out.html[0][1][2] ==~ ".*/test_3_fastqc.html" }, - { assert process.out.html[0][1][3] ==~ ".*/test_4_fastqc.html" }, - { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" }, - { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" }, - { assert process.out.zip[0][1][2] ==~ ".*/test_3_fastqc.zip" }, - { assert process.out.zip[0][1][3] ==~ ".*/test_4_fastqc.zip" }, - { assert path(process.out.html[0][1][0]).text.contains("File typeConventional base calls") }, - { assert path(process.out.html[0][1][1]).text.contains("File typeConventional base calls") }, - { assert path(process.out.html[0][1][2]).text.contains("File typeConventional base calls") }, - { assert path(process.out.html[0][1][3]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_multiple") } + { assert process.success }, + { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" }, + { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" }, + { assert process.out.html[0][1][2] ==~ ".*/test_3_fastqc.html" }, + { assert process.out.html[0][1][3] ==~ ".*/test_4_fastqc.html" }, + { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" }, + { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" }, + { assert process.out.zip[0][1][2] ==~ ".*/test_3_fastqc.zip" }, + { assert process.out.zip[0][1][3] ==~ ".*/test_4_fastqc.zip" }, + { assert path(process.out.html[0][1][0]).text.contains("File typeConventional base calls") }, + { assert path(process.out.html[0][1][1]).text.contains("File typeConventional base calls") }, + { assert path(process.out.html[0][1][2]).text.contains("File typeConventional base calls") }, + { assert path(process.out.html[0][1][3]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } @@ -173,21 +162,18 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - { assert process.out.html[0][1] ==~ ".*/mysample_fastqc.html" }, - { assert process.out.zip[0][1] ==~ ".*/mysample_fastqc.zip" }, - { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_custom_prefix") } + { assert process.success }, + { assert process.out.html[0][1] ==~ ".*/mysample_fastqc.html" }, + { assert process.out.zip[0][1] ==~ ".*/mysample_fastqc.zip" }, + { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } test("sarscov2 single-end [fastq] - stub") { - options "-stub" - + options "-stub" when { process { """ @@ -201,12 +187,123 @@ nextflow_process { then { assertAll ( - { assert process.success }, - { assert snapshot(process.out.html.collect { file(it[1]).getName() } + - process.out.zip.collect { file(it[1]).getName() } + - process.out.versions ).match("fastqc_stub") } + { assert process.success }, + { assert snapshot(process.out).match() } ) } } + test("sarscov2 paired-end [fastq] - stub") { + + options "-stub" + when { + process { + """ + input[0] = Channel.of([ + [id: 'test', single_end: false], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ] + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("sarscov2 interleaved [fastq] - stub") { + + options "-stub" + when { + process { + """ + input[0] = Channel.of([ + [id: 'test', single_end: false], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_interleaved.fastq.gz', checkIfExists: true) + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("sarscov2 paired-end [bam] - stub") { + + options "-stub" + when { + process { + """ + input[0] = Channel.of([ + [id: 'test', single_end: false], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam', checkIfExists: true) + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("sarscov2 multiple [fastq] - stub") { + + options "-stub" + when { + process { + """ + input[0] = Channel.of([ + [id: 'test', single_end: false], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test2_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test2_2.fastq.gz', checkIfExists: true) ] + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("sarscov2 custom_prefix - stub") { + + options "-stub" + when { + process { + """ + input[0] = Channel.of([ + [ id:'mysample', single_end:true ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } } diff --git a/modules/nf-core/fastqc/tests/main.nf.test.snap b/modules/nf-core/fastqc/tests/main.nf.test.snap index 86f7c31..d5db309 100644 --- a/modules/nf-core/fastqc/tests/main.nf.test.snap +++ b/modules/nf-core/fastqc/tests/main.nf.test.snap @@ -1,88 +1,392 @@ { - "fastqc_versions_interleaved": { + "sarscov2 custom_prefix": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:40:07.293713" + "timestamp": "2024-07-22T11:02:16.374038" }, - "fastqc_stub": { + "sarscov2 single-end [fastq] - stub": { "content": [ - [ - "test.html", - "test.zip", - "versions.yml:md5,e1cc25ca8af856014824abd842e93978" - ] + { + "0": [ + [ + { + "id": "test", + "single_end": true + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": true + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "test", + "single_end": true + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "test", + "single_end": true + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.3" + }, + "timestamp": "2024-07-22T11:02:24.993809" + }, + "sarscov2 custom_prefix - stub": { + "content": [ + { + "0": [ + [ + { + "id": "mysample", + "single_end": true + }, + "mysample.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "mysample", + "single_end": true + }, + "mysample.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "mysample", + "single_end": true + }, + "mysample.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "mysample", + "single_end": true + }, + "mysample.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:31:01.425198" + "timestamp": "2024-07-22T11:03:10.93942" }, - "fastqc_versions_multiple": { + "sarscov2 interleaved [fastq]": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:40:55.797907" + "timestamp": "2024-07-22T11:01:42.355718" }, - "fastqc_versions_bam": { + "sarscov2 paired-end [bam]": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:40:26.795862" + "timestamp": "2024-07-22T11:01:53.276274" }, - "fastqc_versions_single": { + "sarscov2 multiple [fastq]": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:39:27.043675" + "timestamp": "2024-07-22T11:02:05.527626" }, - "fastqc_versions_paired": { + "sarscov2 paired-end [fastq]": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" + }, + "timestamp": "2024-07-22T11:01:31.188871" + }, + "sarscov2 paired-end [fastq] - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.3" + }, + "timestamp": "2024-07-22T11:02:34.273566" + }, + "sarscov2 multiple [fastq] - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:39:47.584191" + "timestamp": "2024-07-22T11:03:02.304411" }, - "fastqc_versions_custom_prefix": { + "sarscov2 single-end [fastq]": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" + }, + "timestamp": "2024-07-22T11:01:19.095607" + }, + "sarscov2 interleaved [fastq] - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.3" + }, + "timestamp": "2024-07-22T11:02:44.640184" + }, + "sarscov2 paired-end [bam] - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:41:14.576531" + "timestamp": "2024-07-22T11:02:53.550742" } } \ No newline at end of file diff --git a/modules/nf-core/fastqc/tests/tags.yml b/modules/nf-core/fastqc/tests/tags.yml deleted file mode 100644 index 7834294..0000000 --- a/modules/nf-core/fastqc/tests/tags.yml +++ /dev/null @@ -1,2 +0,0 @@ -fastqc: - - modules/nf-core/fastqc/** diff --git a/modules/nf-core/gzrt/environment.yml b/modules/nf-core/gzrt/environment.yml new file mode 100644 index 0000000..92cec66 --- /dev/null +++ b/modules/nf-core/gzrt/environment.yml @@ -0,0 +1,6 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json +channels: + - conda-forge + - bioconda +dependencies: + - "bioconda::gzrt=0.9.1" diff --git a/modules/nf-core/gzrt/main.nf b/modules/nf-core/gzrt/main.nf new file mode 100644 index 0000000..a8c398e --- /dev/null +++ b/modules/nf-core/gzrt/main.nf @@ -0,0 +1,89 @@ +process GZRT { + tag "$meta.id" + label 'process_single' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/gzrt:0.9.1--h577a1d6_1': + 'biocontainers/gzrt:0.9.1--h577a1d6_1' }" + + input: + tuple val(meta), path(fastqgz) + + output: + tuple val(meta), path("${prefix}*.fastq.gz"), emit: recovered + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + prefix = task.ext.prefix ?: "${meta.id}_recovered" + fastqgz.each { file -> + if (file.extension != "gz") { + error "GZRT works with .gz files only. Offending file: ${file}" + } + + if ((meta.single_end && "${file}" == "${prefix}.fastq.gz") || + (!meta.single_end && ("${file}" == "${prefix}_1.fastq.gz" || "${file}" == "${prefix}_2.fastq.gz"))) { + error "Input and output names are the same, use \"task.ext.prefix\" to disambiguate!" + } + } + + """ + if [ "$meta.single_end" == true ]; then + gzrecover ${args} -p ${fastqgz} | gzip > ${prefix}.fastq.gz + + if [ -e "${prefix}.fastq.gz" ] && [ ! -s "${prefix}.fastq.gz" ]; then + echo "" | gzip > ${prefix}.fastq.gz + fi + else + gzrecover ${args} -p ${fastqgz[0]} | gzip > ${prefix}_1.fastq.gz + gzrecover ${args} -p ${fastqgz[1]} | gzip > ${prefix}_2.fastq.gz + + if [ -e "${prefix}_1.fastq.gz" ] && [ ! -s "${prefix}_1.fastq.gz" ]; then + echo "" | gzip > ${prefix}_1.fastq.gz + fi + + if [ -e "${prefix}_2.fastq.gz" ] && [ ! -s "${prefix}_2.fastq.gz" ]; then + echo "" | gzip > ${prefix}_2.fastq.gz + fi + fi + + soft_line="${task.process}" + ver_line="gzrt: \$(gzrecover -V |& sed '1!d ; s/gzrecover //')" + cat <<-END_VERSIONS > versions.yml + "\${soft_line}": + \${ver_line} + END_VERSIONS + """ + + stub: + prefix = task.ext.prefix ?: "${meta.id}_recovered" + fastqgz.each { file -> + if (file.extension != "gz") { + error "GZRT works with .gz files only. Offending file: ${file}" + } + + if ((meta.single_end && "${file}" == "${prefix}.fastq.gz") || + (!meta.single_end && ("${file}" == "${prefix}_1.fastq.gz" || "${file}" == "${prefix}_2.fastq.gz"))) { + error "Input and output names are the same, use \"task.ext.prefix\" to disambiguate!" + } + } + """ + if [ "$meta.single_end" == true ]; then + echo "" | gzip > ${prefix}.fastq.gz + else + echo "" | gzip > ${prefix}_1.fastq.gz + echo "" | gzip > ${prefix}_2.fastq.gz + fi + + soft_line="${task.process}" + ver_line="gzrt: \$(gzrecover -V |& sed '1!d ; s/gzrecover //')" + cat <<-END_VERSIONS > versions.yml + "\${soft_line}": + \${ver_line} + END_VERSIONS + """ +} diff --git a/modules/nf-core/gzrt/meta.yml b/modules/nf-core/gzrt/meta.yml new file mode 100644 index 0000000..5bc42cf --- /dev/null +++ b/modules/nf-core/gzrt/meta.yml @@ -0,0 +1,55 @@ +name: "gzrt" +description: gzrecover is a program that will attempt to extract any readable data out of a gzip file that has been corrupted +keywords: + - corrupted + - fastq + - recovery +tools: + - "gzrt": + description: "Unofficial build of the gzip Recovery Toolkit aka gzrecover" + homepage: "https://www.urbanophile.com/arenn/hacking/gzrt/gzrt.html" + documentation: "https://www.urbanophile.com/arenn/hacking/gzrt/gzrt.html" + tool_dev_url: "https://github.com/arenn/gzrt" + doi: "no DOI available" + licence: ["GPL v2"] + identifier: "" + +input: + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1', single_end:false ]` + + - fastqgz: + type: file + description: FASTQ.gz files + pattern: "*.{gz}" + ontologies: + - edam: "http://edamontology.org/format_3989" + +output: + - recovered: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. `[ id:'sample1', single_end:false ]` + - "${prefix}*.fastq.gz": + type: file + description: Recovered FASTQ.gz files + pattern: "*.{gz}" + ontologies: + - edam: "http://edamontology.org/format_3989" + + - versions: + - "versions.yml": + type: file + description: File containing software versions + pattern: "versions.yml" + +authors: + - "@mazzalab" +maintainers: + - "@mazzalab" + - "@tm4zza" diff --git a/modules/nf-core/gzrt/tests/main.nf.test b/modules/nf-core/gzrt/tests/main.nf.test new file mode 100644 index 0000000..71a2a9a --- /dev/null +++ b/modules/nf-core/gzrt/tests/main.nf.test @@ -0,0 +1,100 @@ +nextflow_process { + + name "Test Process GZRT" + script "../main.nf" + process "GZRT" + + tag "modules" + tag "modules_nfcore" + tag "gzrt" + + test("Run gzrt on test2_1_corrupted single-end - fastq.gz") { + when { + process { + """ + input[0] = + [ + [ id:'test-single', single_end:true ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/fastq/test2_1_corrupted_10kb.fastq.gz', checkIfExists: true) + ] + """ + } + } + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("Run gzrt on test2_1_corrupted and test2_2 paired-end - fastq.gz") { + when { + process { + """ + input[0] = + [ + [ id:'test-paired', single_end:false ], // meta map + [ + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/fastq/test2_1_corrupted_10kb.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/fastq/test2_2.fastq.gz', checkIfExists: true) + ] + ] + """ + } + } + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("Run gzrt on test2_1_corrupted single-end - fastq.gz - stub") { + options "-stub" + + when { + process { + """ + input[0] = [ + [ id:'test-single-stub', single_end:true ], // meta map + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/fastq/test2_1_corrupted_10kb.fastq.gz', checkIfExists: true) + ] + """ + } + } + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("Run gzrt on test2_1_corrupted and test2_2 paired-end - fastq.gz - stub") { + options "-stub" + + when { + process { + """ + input[0] = + [ + [ id:'test-paired-stub', single_end:false ], // meta map + [ + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/fastq/test2_1_corrupted_10kb.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/fastq/test2_2.fastq.gz', checkIfExists: true) + ] + ] + """ + } + } + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + +} \ No newline at end of file diff --git a/modules/nf-core/gzrt/tests/main.nf.test.snap b/modules/nf-core/gzrt/tests/main.nf.test.snap new file mode 100644 index 0000000..cd4c1c2 --- /dev/null +++ b/modules/nf-core/gzrt/tests/main.nf.test.snap @@ -0,0 +1,154 @@ +{ + "Run gzrt on test2_1_corrupted single-end - fastq.gz": { + "content": [ + { + "0": [ + [ + { + "id": "test-single", + "single_end": true + }, + "test-single_recovered.fastq.gz:md5,6bc7d0b6304039eadba7321d21e7b2c9" + ] + ], + "1": [ + "versions.yml:md5,6d1e28b8a8043e3cba67606c7acdc676" + ], + "recovered": [ + [ + { + "id": "test-single", + "single_end": true + }, + "test-single_recovered.fastq.gz:md5,6bc7d0b6304039eadba7321d21e7b2c9" + ] + ], + "versions": [ + "versions.yml:md5,6d1e28b8a8043e3cba67606c7acdc676" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.3" + }, + "timestamp": "2024-12-24T11:23:22.53088" + }, + "Run gzrt on test2_1_corrupted single-end - fastq.gz - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test-single-stub", + "single_end": true + }, + "test-single-stub_recovered.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "1": [ + "versions.yml:md5,6d1e28b8a8043e3cba67606c7acdc676" + ], + "recovered": [ + [ + { + "id": "test-single-stub", + "single_end": true + }, + "test-single-stub_recovered.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "versions": [ + "versions.yml:md5,6d1e28b8a8043e3cba67606c7acdc676" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.3" + }, + "timestamp": "2024-12-24T11:24:13.88887" + }, + "Run gzrt on test2_1_corrupted and test2_2 paired-end - fastq.gz - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test-paired-stub", + "single_end": false + }, + [ + "test-paired-stub_recovered_1.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test-paired-stub_recovered_2.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "1": [ + "versions.yml:md5,6d1e28b8a8043e3cba67606c7acdc676" + ], + "recovered": [ + [ + { + "id": "test-paired-stub", + "single_end": false + }, + [ + "test-paired-stub_recovered_1.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test-paired-stub_recovered_2.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "versions": [ + "versions.yml:md5,6d1e28b8a8043e3cba67606c7acdc676" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.3" + }, + "timestamp": "2024-12-24T11:24:58.632197" + }, + "Run gzrt on test2_1_corrupted and test2_2 paired-end - fastq.gz": { + "content": [ + { + "0": [ + [ + { + "id": "test-paired", + "single_end": false + }, + [ + "test-paired_recovered_1.fastq.gz:md5,6bc7d0b6304039eadba7321d21e7b2c9", + "test-paired_recovered_2.fastq.gz:md5,641919946a0e572ca7483f6e54476f3b" + ] + ] + ], + "1": [ + "versions.yml:md5,6d1e28b8a8043e3cba67606c7acdc676" + ], + "recovered": [ + [ + { + "id": "test-paired", + "single_end": false + }, + [ + "test-paired_recovered_1.fastq.gz:md5,6bc7d0b6304039eadba7321d21e7b2c9", + "test-paired_recovered_2.fastq.gz:md5,641919946a0e572ca7483f6e54476f3b" + ] + ] + ], + "versions": [ + "versions.yml:md5,6d1e28b8a8043e3cba67606c7acdc676" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.3" + }, + "timestamp": "2024-12-24T11:23:40.906933" + } +} \ No newline at end of file diff --git a/modules/nf-core/multiqc/environment.yml b/modules/nf-core/multiqc/environment.yml index ca39fb6..a27122c 100644 --- a/modules/nf-core/multiqc/environment.yml +++ b/modules/nf-core/multiqc/environment.yml @@ -1,7 +1,5 @@ -name: multiqc channels: - conda-forge - bioconda - - defaults dependencies: - - bioconda::multiqc=1.21 + - bioconda::multiqc=1.27 diff --git a/modules/nf-core/multiqc/main.nf b/modules/nf-core/multiqc/main.nf index 47ac352..58d9313 100644 --- a/modules/nf-core/multiqc/main.nf +++ b/modules/nf-core/multiqc/main.nf @@ -3,14 +3,16 @@ process MULTIQC { conda "${moduleDir}/environment.yml" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/multiqc:1.21--pyhdfd78af_0' : - 'biocontainers/multiqc:1.21--pyhdfd78af_0' }" + 'https://depot.galaxyproject.org/singularity/multiqc:1.27--pyhdfd78af_0' : + 'biocontainers/multiqc:1.27--pyhdfd78af_0' }" input: path multiqc_files, stageAs: "?/*" path(multiqc_config) path(extra_multiqc_config) path(multiqc_logo) + path(replace_names) + path(sample_names) output: path "*multiqc_report.html", emit: report @@ -23,16 +25,22 @@ process MULTIQC { script: def args = task.ext.args ?: '' + def prefix = task.ext.prefix ? "--filename ${task.ext.prefix}.html" : '' def config = multiqc_config ? "--config $multiqc_config" : '' def extra_config = extra_multiqc_config ? "--config $extra_multiqc_config" : '' - def logo = multiqc_logo ? /--cl-config 'custom_logo: "${multiqc_logo}"'/ : '' + def logo = multiqc_logo ? "--cl-config 'custom_logo: \"${multiqc_logo}\"'" : '' + def replace = replace_names ? "--replace-names ${replace_names}" : '' + def samples = sample_names ? "--sample-names ${sample_names}" : '' """ multiqc \\ --force \\ $args \\ $config \\ + $prefix \\ $extra_config \\ $logo \\ + $replace \\ + $samples \\ . cat <<-END_VERSIONS > versions.yml @@ -44,7 +52,7 @@ process MULTIQC { stub: """ mkdir multiqc_data - touch multiqc_plots + mkdir multiqc_plots touch multiqc_report.html cat <<-END_VERSIONS > versions.yml diff --git a/modules/nf-core/multiqc/meta.yml b/modules/nf-core/multiqc/meta.yml index 45a9bc3..b16c187 100644 --- a/modules/nf-core/multiqc/meta.yml +++ b/modules/nf-core/multiqc/meta.yml @@ -1,5 +1,6 @@ name: multiqc -description: Aggregate results from bioinformatics analyses across many samples into a single report +description: Aggregate results from bioinformatics analyses across many samples into + a single report keywords: - QC - bioinformatics tools @@ -12,40 +13,59 @@ tools: homepage: https://multiqc.info/ documentation: https://multiqc.info/docs/ licence: ["GPL-3.0-or-later"] + identifier: biotools:multiqc input: - - multiqc_files: - type: file - description: | - List of reports / files recognised by MultiQC, for example the html and zip output of FastQC - - multiqc_config: - type: file - description: Optional config yml for MultiQC - pattern: "*.{yml,yaml}" - - extra_multiqc_config: - type: file - description: Second optional config yml for MultiQC. Will override common sections in multiqc_config. - pattern: "*.{yml,yaml}" - - multiqc_logo: - type: file - description: Optional logo file for MultiQC - pattern: "*.{png}" + - - multiqc_files: + type: file + description: | + List of reports / files recognised by MultiQC, for example the html and zip output of FastQC + - - multiqc_config: + type: file + description: Optional config yml for MultiQC + pattern: "*.{yml,yaml}" + - - extra_multiqc_config: + type: file + description: Second optional config yml for MultiQC. Will override common sections + in multiqc_config. + pattern: "*.{yml,yaml}" + - - multiqc_logo: + type: file + description: Optional logo file for MultiQC + pattern: "*.{png}" + - - replace_names: + type: file + description: | + Optional two-column sample renaming file. First column a set of + patterns, second column a set of corresponding replacements. Passed via + MultiQC's `--replace-names` option. + pattern: "*.{tsv}" + - - sample_names: + type: file + description: | + Optional TSV file with headers, passed to the MultiQC --sample_names + argument. + pattern: "*.{tsv}" output: - report: - type: file - description: MultiQC report file - pattern: "multiqc_report.html" + - "*multiqc_report.html": + type: file + description: MultiQC report file + pattern: "multiqc_report.html" - data: - type: directory - description: MultiQC data dir - pattern: "multiqc_data" + - "*_data": + type: directory + description: MultiQC data dir + pattern: "multiqc_data" - plots: - type: file - description: Plots created by MultiQC - pattern: "*_data" + - "*_plots": + type: file + description: Plots created by MultiQC + pattern: "*_data" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@abhi18av" - "@bunop" diff --git a/modules/nf-core/multiqc/tests/main.nf.test b/modules/nf-core/multiqc/tests/main.nf.test index f1c4242..33316a7 100644 --- a/modules/nf-core/multiqc/tests/main.nf.test +++ b/modules/nf-core/multiqc/tests/main.nf.test @@ -8,6 +8,8 @@ nextflow_process { tag "modules_nfcore" tag "multiqc" + config "./nextflow.config" + test("sarscov2 single-end [fastqc]") { when { @@ -17,6 +19,8 @@ nextflow_process { input[1] = [] input[2] = [] input[3] = [] + input[4] = [] + input[5] = [] """ } } @@ -41,6 +45,8 @@ nextflow_process { input[1] = Channel.of(file("https://github.com/nf-core/tools/raw/dev/nf_core/pipeline-template/assets/multiqc_config.yml", checkIfExists: true)) input[2] = [] input[3] = [] + input[4] = [] + input[5] = [] """ } } @@ -66,6 +72,8 @@ nextflow_process { input[1] = [] input[2] = [] input[3] = [] + input[4] = [] + input[5] = [] """ } } diff --git a/modules/nf-core/multiqc/tests/main.nf.test.snap b/modules/nf-core/multiqc/tests/main.nf.test.snap index bfebd80..7b7c132 100644 --- a/modules/nf-core/multiqc/tests/main.nf.test.snap +++ b/modules/nf-core/multiqc/tests/main.nf.test.snap @@ -2,14 +2,14 @@ "multiqc_versions_single": { "content": [ [ - "versions.yml:md5,21f35ee29416b9b3073c28733efe4b7d" + "versions.yml:md5,8f3b8c1cec5388cf2708be948c9fa42f" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.2", + "nextflow": "24.10.4" }, - "timestamp": "2024-02-29T08:48:55.657331" + "timestamp": "2025-01-27T09:29:57.631982377" }, "multiqc_stub": { "content": [ @@ -17,25 +17,25 @@ "multiqc_report.html", "multiqc_data", "multiqc_plots", - "versions.yml:md5,21f35ee29416b9b3073c28733efe4b7d" + "versions.yml:md5,8f3b8c1cec5388cf2708be948c9fa42f" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.2", + "nextflow": "24.10.4" }, - "timestamp": "2024-02-29T08:49:49.071937" + "timestamp": "2025-01-27T09:30:34.743726958" }, "multiqc_versions_config": { "content": [ [ - "versions.yml:md5,21f35ee29416b9b3073c28733efe4b7d" + "versions.yml:md5,8f3b8c1cec5388cf2708be948c9fa42f" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.2", + "nextflow": "24.10.4" }, - "timestamp": "2024-02-29T08:49:25.457567" + "timestamp": "2025-01-27T09:30:21.44383553" } } \ No newline at end of file diff --git a/modules/nf-core/multiqc/tests/nextflow.config b/modules/nf-core/multiqc/tests/nextflow.config new file mode 100644 index 0000000..c537a6a --- /dev/null +++ b/modules/nf-core/multiqc/tests/nextflow.config @@ -0,0 +1,5 @@ +process { + withName: 'MULTIQC' { + ext.prefix = null + } +} diff --git a/modules/nf-core/wipertools/fastqgather/environment.yml b/modules/nf-core/wipertools/fastqgather/environment.yml new file mode 100644 index 0000000..2c9cfba --- /dev/null +++ b/modules/nf-core/wipertools/fastqgather/environment.yml @@ -0,0 +1,7 @@ +--- +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json +channels: + - conda-forge + - bioconda +dependencies: + - "bioconda::wipertools=1.1.5" diff --git a/modules/nf-core/wipertools/fastqgather/main.nf b/modules/nf-core/wipertools/fastqgather/main.nf new file mode 100644 index 0000000..cab21ff --- /dev/null +++ b/modules/nf-core/wipertools/fastqgather/main.nf @@ -0,0 +1,57 @@ +process WIPERTOOLS_FASTQGATHER { + tag "$meta.id" + label 'process_single' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/wipertools:1.1.5--pyhdfd78af_0': + 'biocontainers/wipertools:1.1.5--pyhdfd78af_0' }" + + input: + tuple val(meta), path(fastq) + + output: + tuple val(meta), path("${prefix}.fastq.gz"), emit: gathered_fastq + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + prefix = task.ext.prefix ?: "${meta.id}_gather" + + // Check if the output file name is in the list of input files + if (fastq.any { it.name == "${prefix}.fastq.gz" }) { + error 'Output file name "${prefix}.fastq.gz}" matches one of the input files. Use \"task.ext.prefix\" to disambiguate!.' + } + + """ + wipertools \\ + fastqgather \\ + -i $fastq \\ + -o ${prefix}.fastq.gz \\ + ${args} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + wipertools fastqgather: \$(wipertools fastqgather --version) + END_VERSIONS + """ + + stub: + prefix = task.ext.prefix ?: "${meta.id}_gather" + + // Check if the output file name is in the list of input files + if (fastq.any { it.name == "${prefix}.fastq.gz" }) { + error 'Output file name "${prefix}.fastq.gz}" matches one of the input files. Use \"task.ext.prefix\" to disambiguate!.' + } + """ + echo "" | gzip > ${prefix}.fastq.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + wipertools fastqgather: \$(wipertools fastqgather --version) + END_VERSIONS + """ +} diff --git a/modules/nf-core/wipertools/fastqgather/meta.yml b/modules/nf-core/wipertools/fastqgather/meta.yml new file mode 100644 index 0000000..a3ce38c --- /dev/null +++ b/modules/nf-core/wipertools/fastqgather/meta.yml @@ -0,0 +1,58 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json +name: "wipertools_fastqgather" +description: A tool of the wipertools suite that merges FASTQ chunks produced by wipertools_fastqscatter +keywords: + - fastq + - merge + - union +tools: + - "fastqgather": + description: "A tool of the wipertools suite that merges FASTQ chunks produced\ + \ by wipertools_fastqscatter." + homepage: "https://github.com/mazzalab/fastqwiper" + documentation: "https://github.com/mazzalab/fastqwiper" + tool_dev_url: "https://github.com/mazzalab/fastqwiper" + doi: "no DOI available" + licence: ["GPL v2-or-later"] + identifier: "" + args_id: "$args" + +input: + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test' ] + - fastq: + type: file + description: List of FASTQ chunk files to be merged + pattern: "*.{fastq,fastq.gz}" + ontologies: + - edam: "http://edamontology.org/format_1930" + - edam: "http://edamontology.org/format_3989" +output: + - gathered_fastq: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test' ] + - "${prefix}.fastq.gz": + type: file + description: The resulting FASTQ file + pattern: "*.fastq.gz" + ontologies: + - edam: "http://edamontology.org/format_1930" + - edam: "http://edamontology.org/format_3989" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" + ontologies: + - edam: "http://edamontology.org/format_2330" +authors: + - "@tm4zza" +maintainers: + - "@mazzalab" + - "@tm4zza" diff --git a/modules/nf-core/wipertools/fastqgather/tests/main.nf.test b/modules/nf-core/wipertools/fastqgather/tests/main.nf.test new file mode 100644 index 0000000..69288f0 --- /dev/null +++ b/modules/nf-core/wipertools/fastqgather/tests/main.nf.test @@ -0,0 +1,72 @@ +nextflow_process { + name "Test Process WIPERTOOLS_FASTQGATHER" + script "../main.nf" + process "WIPERTOOLS_FASTQGATHER" + + tag "modules" + tag "modules_nfcore" + tag "wipertools" + tag "wipertools/fastqgather" + tag "wipertools/fastqscatter" + + setup { + run("WIPERTOOLS_FASTQSCATTER") { + script "../../fastqscatter/main.nf" + process { + """ + input[0] = [ + [ id:'fastq_chunks' ], // meta map + file(params.modules_testdata_base_path + "genomics/sarscov2/illumina/fastq/test_1.fastq.gz", checkIfExists: true) + ] + input[1] = 2 // number of splits + """ + } + } + } + + test("merge two gzipped fastq - fastq.gz") { + config "./nextflow.config" + + when { + params { + module_args = '--prefix fastq_chunks' + } + process { + """ + input[0] = WIPERTOOLS_FASTQSCATTER.out.fastq_chunks + """ + } + } + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + + } + + test("merge two gzipped fastq - fastq.gz - stub") { + options "-stub" + config "./nextflow.config" + + when { + params { + module_args = '--prefix fastq_chunks' + } + process { + """ + input[0] = WIPERTOOLS_FASTQSCATTER.out.fastq_chunks + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + +} diff --git a/modules/nf-core/wipertools/fastqgather/tests/main.nf.test.snap b/modules/nf-core/wipertools/fastqgather/tests/main.nf.test.snap new file mode 100644 index 0000000..bcc555c --- /dev/null +++ b/modules/nf-core/wipertools/fastqgather/tests/main.nf.test.snap @@ -0,0 +1,68 @@ +{ + "merge two gzipped fastq - fastq.gz - stub": { + "content": [ + { + "0": [ + [ + { + "id": "fastq_chunks" + }, + "fastq_chunks_gather.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "1": [ + "versions.yml:md5,b4df541ef0f00762a863e9b88d620301" + ], + "gathered_fastq": [ + [ + { + "id": "fastq_chunks" + }, + "fastq_chunks_gather.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "versions": [ + "versions.yml:md5,b4df541ef0f00762a863e9b88d620301" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.2" + }, + "timestamp": "2025-01-31T10:49:02.021919993" + }, + "merge two gzipped fastq - fastq.gz": { + "content": [ + { + "0": [ + [ + { + "id": "fastq_chunks" + }, + "fastq_chunks_gather.fastq.gz:md5,4161df271f9bfcd25d5845a1e220dbec" + ] + ], + "1": [ + "versions.yml:md5,b4df541ef0f00762a863e9b88d620301" + ], + "gathered_fastq": [ + [ + { + "id": "fastq_chunks" + }, + "fastq_chunks_gather.fastq.gz:md5,4161df271f9bfcd25d5845a1e220dbec" + ] + ], + "versions": [ + "versions.yml:md5,b4df541ef0f00762a863e9b88d620301" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.2" + }, + "timestamp": "2025-01-31T10:48:43.687455488" + } +} diff --git a/modules/nf-core/wipertools/fastqgather/tests/nextflow.config b/modules/nf-core/wipertools/fastqgather/tests/nextflow.config new file mode 100644 index 0000000..ce94189 --- /dev/null +++ b/modules/nf-core/wipertools/fastqgather/tests/nextflow.config @@ -0,0 +1,5 @@ +process { + withName: WIPERTOOLS_FASTQGATHER { + ext.args = params.module_args + } +} diff --git a/modules/nf-core/wipertools/fastqscatter/environment.yml b/modules/nf-core/wipertools/fastqscatter/environment.yml new file mode 100644 index 0000000..2c9cfba --- /dev/null +++ b/modules/nf-core/wipertools/fastqscatter/environment.yml @@ -0,0 +1,7 @@ +--- +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json +channels: + - conda-forge + - bioconda +dependencies: + - "bioconda::wipertools=1.1.5" diff --git a/modules/nf-core/wipertools/fastqscatter/main.nf b/modules/nf-core/wipertools/fastqscatter/main.nf new file mode 100644 index 0000000..a83a775 --- /dev/null +++ b/modules/nf-core/wipertools/fastqscatter/main.nf @@ -0,0 +1,65 @@ +process WIPERTOOLS_FASTQSCATTER { + tag "$meta.id" + label 'process_single' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/wipertools:1.1.5--pyhdfd78af_0': + 'biocontainers/wipertools:1.1.5--pyhdfd78af_0' }" + + input: + tuple val(meta), path(fastq) + val(num_splits) + + output: + tuple val(meta), path("${out_folder}/*") , emit: fastq_chunks + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def args_list = args.tokenize() + out_folder = (args_list.contains('--out_folder') ? args_list[args_list.indexOf('--out_folder')+1] : + (args_list.contains('-o') ? args_list[args_list.indexOf('-o')+1] : 'chunks')) + if(!args.contains('-o') && !args.contains('--out_folder')) { + args += " -o ${out_folder}" + } + """ + wipertools \\ + fastqscatter \\ + -f ${fastq} \\ + -n ${num_splits} \\ + -p ${prefix} \\ + ${args} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + wipertools fastqscatter: \$(wipertools fastqscatter --version) + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def args_list = args.tokenize() + out_folder = (args_list.contains('--out_folder') ? args_list[args_list.indexOf('--out_folder')+1] : + (args_list.contains('-o') ? args_list[args_list.indexOf('-o')+1] : 'chunks')) + if(!args.contains('-o') && !args.contains('--out_folder')) { + args += " -o ${out_folder}" + } + """ + mkdir ${out_folder} + for i in {1..${num_splits}} + do + echo "" | gzip > ${out_folder}/${prefix}_\$i-of-${num_splits}_suffix.fastq.gz + done + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + wipertools fastqscatter: \$(wipertools fastqscatter --version) + END_VERSIONS + """ +} diff --git a/modules/nf-core/wipertools/fastqscatter/meta.yml b/modules/nf-core/wipertools/fastqscatter/meta.yml new file mode 100644 index 0000000..b2508f8 --- /dev/null +++ b/modules/nf-core/wipertools/fastqscatter/meta.yml @@ -0,0 +1,61 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json +name: "wipertools_fastqscatter" +description: A tool of the wipertools suite that splits FASTQ files into chunks +keywords: + - fastq + - split + - partitioning +tools: + - "fastqscatter": + description: "A tool of the wipertools suite that splits FASTQ files into chunks." + homepage: "https://github.com/mazzalab/fastqwiper" + documentation: "https://github.com/mazzalab/fastqwiper" + tool_dev_url: "https://github.com/mazzalab/fastqwiper" + doi: "no DOI available" + licence: ["GPL v2-or-later"] + identifier: "" + args_id: "$args" + +input: + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test' ] + - fastq: + type: file + description: FASTQ file + pattern: "*.{fastq,fastq.gz}" + ontologies: + - edam: "http://edamontology.org/format_1930" + - - num_splits: + type: integer + description: Number of desired chunks + pattern: "[1-9][0-9]*" + +output: + - fastq_chunks: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test' ] + - "${out_folder}/*": + type: file + description: Chunk FASTQ files + pattern: "*.{fastq,fastq.gz}" + ontologies: + - edam: "http://edamontology.org/format_1930" + - edam: "http://edamontology.org/format_3989" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" + ontologies: + - edam: "http://edamontology.org/format_3750" +authors: + - "@tm4zza" +maintainers: + - "@mazzalab" + - "@tm4zza" diff --git a/modules/nf-core/wipertools/fastqscatter/tests/main.nf.test b/modules/nf-core/wipertools/fastqscatter/tests/main.nf.test new file mode 100644 index 0000000..0d48e5a --- /dev/null +++ b/modules/nf-core/wipertools/fastqscatter/tests/main.nf.test @@ -0,0 +1,63 @@ +nextflow_process { + name "Test Process WIPERTOOLS_FASTQSCATTER" + script "../main.nf" + process "WIPERTOOLS_FASTQSCATTER" + + tag "modules" + tag "modules_nfcore" + tag "wipertools" + tag "wipertools/fastqscatter" + + test("test1 - fastq") { + config "./nextflow.config" + + when { + params { + module_args = '-o chunks_custom -O unix' + } + process { + """ + input[0] = [ + [ id:'test' ], // meta map + file(params.modules_testdata_base_path + "genomics/sarscov2/illumina/fastq/test_1.fastq.gz", checkIfExists: true) + ] + input[1] = 2 // splits num + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("test1 - fastq - stub") { + options "-stub" + config "./nextflow.config" + + when { + params { + module_args = '-o chunks_custom -O unix' + } + process { + """ + input[0] = [ + [ id:'test' ], + file(params.modules_testdata_base_path + "genomics/sarscov2/illumina/fastq/test_1.fastq.gz", checkIfExists: true) + ] + input[1] = 2 // splits num + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } +} diff --git a/modules/nf-core/wipertools/fastqscatter/tests/main.nf.test.snap b/modules/nf-core/wipertools/fastqscatter/tests/main.nf.test.snap new file mode 100644 index 0000000..de73ca5 --- /dev/null +++ b/modules/nf-core/wipertools/fastqscatter/tests/main.nf.test.snap @@ -0,0 +1,80 @@ +{ + "test1 - fastq - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + [ + "test_1-of-2_suffix.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test_2-of-2_suffix.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "1": [ + "versions.yml:md5,48702abf648becf3110470614c556f1b" + ], + "fastq_chunks": [ + [ + { + "id": "test" + }, + [ + "test_1-of-2_suffix.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "test_2-of-2_suffix.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "versions": [ + "versions.yml:md5,48702abf648becf3110470614c556f1b" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.2" + }, + "timestamp": "2025-01-31T10:49:30.871539752" + }, + "test1 - fastq": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + [ + "test_000.fastq.gz:md5,ecc4b1841cd94704bba742ea4dcd48b0", + "test_001.fastq.gz:md5,b3de467f2b6ab0d14e1f6ce14932a411" + ] + ] + ], + "1": [ + "versions.yml:md5,48702abf648becf3110470614c556f1b" + ], + "fastq_chunks": [ + [ + { + "id": "test" + }, + [ + "test_000.fastq.gz:md5,ecc4b1841cd94704bba742ea4dcd48b0", + "test_001.fastq.gz:md5,b3de467f2b6ab0d14e1f6ce14932a411" + ] + ] + ], + "versions": [ + "versions.yml:md5,48702abf648becf3110470614c556f1b" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.2" + }, + "timestamp": "2025-01-31T10:49:16.633370109" + } +} \ No newline at end of file diff --git a/modules/nf-core/wipertools/fastqscatter/tests/nextflow.config b/modules/nf-core/wipertools/fastqscatter/tests/nextflow.config new file mode 100644 index 0000000..c4b8951 --- /dev/null +++ b/modules/nf-core/wipertools/fastqscatter/tests/nextflow.config @@ -0,0 +1,5 @@ +process { + withName: WIPERTOOLS_FASTQSCATTER { + ext.args = params.module_args + } +} diff --git a/modules/nf-core/wipertools/fastqwiper/environment.yml b/modules/nf-core/wipertools/fastqwiper/environment.yml new file mode 100644 index 0000000..2c9cfba --- /dev/null +++ b/modules/nf-core/wipertools/fastqwiper/environment.yml @@ -0,0 +1,7 @@ +--- +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json +channels: + - conda-forge + - bioconda +dependencies: + - "bioconda::wipertools=1.1.5" diff --git a/modules/nf-core/wipertools/fastqwiper/main.nf b/modules/nf-core/wipertools/fastqwiper/main.nf new file mode 100644 index 0000000..cab3373 --- /dev/null +++ b/modules/nf-core/wipertools/fastqwiper/main.nf @@ -0,0 +1,51 @@ +process WIPERTOOLS_FASTQWIPER { + tag "$meta.id" + label 'process_single' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/wipertools:1.1.5--pyhdfd78af_0': + 'biocontainers/wipertools:1.1.5--pyhdfd78af_0' }" + + input: + tuple val(meta), path(fastq) + + output: + tuple val(meta), path("${prefix}.fastq.gz"), emit: wiped_fastq + tuple val(meta), path("*.report") , emit: report + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + prefix = task.ext.prefix ?: "${meta.id}_wiped" + if ("${fastq}" == "${prefix}.fastq.gz") error "Input and output names are the same, use \"task.ext.prefix\" to disambiguate!." + """ + wipertools \\ + fastqwiper \\ + -i ${fastq} \\ + -o ${prefix}.fastq.gz \\ + -r ${prefix}.report \\ + ${args} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + wipertools fastqwiper: \$(wipertools fastqwiper --version) + END_VERSIONS + """ + + stub: + prefix = task.ext.prefix ?: "${meta.id}_wiped" + if ("${fastq}" == "${prefix}.fastq.gz") error "Input and output names are the same, use \"task.ext.prefix\" to disambiguate!." + """ + echo "" | gzip > ${prefix}.fastq.gz + touch ${prefix}.report + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + wipertools fastqwiper: \$(wipertools fastqwiper --version) + END_VERSIONS + """ +} diff --git a/modules/nf-core/wipertools/fastqwiper/meta.yml b/modules/nf-core/wipertools/fastqwiper/meta.yml new file mode 100644 index 0000000..8c35c1b --- /dev/null +++ b/modules/nf-core/wipertools/fastqwiper/meta.yml @@ -0,0 +1,70 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json +name: "wipertools_fastqwiper" +description: A tool of the wipertools suite that fixes or wipes out uncompliant reads from FASTQ files +keywords: + - fastq + - malformed + - corrupted + - fix +tools: + - "fastqwiper": + description: "A tool of the wipertools suite that that fixes or wipes out uncompliant reads from FASTQ files." + homepage: "https://github.com/mazzalab/fastqwiper" + documentation: "https://github.com/mazzalab/fastqwiper" + tool_dev_url: "https://github.com/mazzalab/fastqwiper" + doi: "no DOI available" + licence: ["GPL v2-or-later"] + identifier: "" + args_id: "$args" + +input: + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test' ] + - fastq: + type: file + description: FASTQ file to be cleaned + pattern: "*.{fastq,fastq.gz}" + ontologies: + - edam: "http://edamontology.org/format_1930" + - edam: "http://edamontology.org/format_3989" +output: + - wiped_fastq: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test' ] + - "${prefix}.fastq.gz": + type: file + description: Cleaned FASTQ file + pattern: "*.{fastq,fastq.gz}" + ontologies: + - edam: "http://edamontology.org/format_1930" + - edam: "http://edamontology.org/format_3989" + - report: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test' ] + - "*.report": + type: file + description: Summary of the cleaning task + pattern: "*.report" + ontologies: + - edam: "http://edamontology.org/format_2330" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" + ontologies: + - edam: "http://edamontology.org/format_3750" +authors: + - "@tm4zza" +maintainers: + - "@mazzalab" + - "@tm4zza" diff --git a/modules/nf-core/wipertools/fastqwiper/tests/main.nf.test b/modules/nf-core/wipertools/fastqwiper/tests/main.nf.test new file mode 100644 index 0000000..d21224a --- /dev/null +++ b/modules/nf-core/wipertools/fastqwiper/tests/main.nf.test @@ -0,0 +1,75 @@ +nextflow_process { + name "Test Process WIPERTOOLS_FASTQWIPER" + script "../main.nf" + process "WIPERTOOLS_FASTQWIPER" + + tag "modules" + tag "modules_nfcore" + tag "wipertools" + tag "wipertools/fastqwiper" + + test("Quality mismatch - fastq") { + config "./nextflow.config" + + when { + params { + module_args = '-a ACGT' + } + process { + """ + input[0] = [ + [ id:'test' ], + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/fastq/test_quality_mismatch.fastq', checkIfExists: true) + ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert path(process.out.wiped_fastq.get(0).get(1)).linesGzip.size() == 8}, + { + def read_final_qual = path(process.out.wiped_fastq.get(0).get(1)).linesGzip[7] + assert read_final_qual.equals('CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG') + }, + { assert path(process.out.report.get(0).get(1)).readLines().size() == 11}, + { + def report_header = path(process.out.report.get(0).get(1)).readLines()[0] + assert report_header.equals('FASTQWIPER REPORT:') + }, + { + def qual_check = path(process.out.report.get(0).get(1)).readLines()[9] + assert qual_check.equals('Reads discarded because len(SEQ) neq len(QUAL): 1') + }, + { assert snapshot(process.out).match() } + ) + } + } + + test("Quality mismatch - fastq - stub") { + options "-stub" + config "./nextflow.config" + + when { + params { + module_args = '-a ACGT' + } + process { + """ + input[0] = [ + [ id:'test' ], + file(params.modules_testdata_base_path + "genomics/homo_sapiens/illumina/fastq/test_quality_mismatch.fastq", checkIfExists: true) + ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } +} diff --git a/modules/nf-core/wipertools/fastqwiper/tests/main.nf.test.snap b/modules/nf-core/wipertools/fastqwiper/tests/main.nf.test.snap new file mode 100644 index 0000000..cdc9720 --- /dev/null +++ b/modules/nf-core/wipertools/fastqwiper/tests/main.nf.test.snap @@ -0,0 +1,100 @@ +{ + "Quality mismatch - fastq - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test_wiped.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "1": [ + [ + { + "id": "test" + }, + "test_wiped.report:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,f736a990049d46418335d7e5b66e8569" + ], + "report": [ + [ + { + "id": "test" + }, + "test_wiped.report:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,f736a990049d46418335d7e5b66e8569" + ], + "wiped_fastq": [ + [ + { + "id": "test" + }, + "test_wiped.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.3" + }, + "timestamp": "2025-01-18T07:49:20.507561" + }, + "Quality mismatch - fastq": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test_wiped.fastq.gz:md5,81b1d99cf5c1615123a8c31760858a10" + ] + ], + "1": [ + [ + { + "id": "test" + }, + "test_wiped.report:md5,d6fe1f0066f538993bb8f6b90150c8d5" + ] + ], + "2": [ + "versions.yml:md5,f736a990049d46418335d7e5b66e8569" + ], + "report": [ + [ + { + "id": "test" + }, + "test_wiped.report:md5,d6fe1f0066f538993bb8f6b90150c8d5" + ] + ], + "versions": [ + "versions.yml:md5,f736a990049d46418335d7e5b66e8569" + ], + "wiped_fastq": [ + [ + { + "id": "test" + }, + "test_wiped.fastq.gz:md5,81b1d99cf5c1615123a8c31760858a10" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.3" + }, + "timestamp": "2025-01-18T07:49:09.920936" + } +} diff --git a/modules/nf-core/wipertools/fastqwiper/tests/nextflow.config b/modules/nf-core/wipertools/fastqwiper/tests/nextflow.config new file mode 100644 index 0000000..916a0d0 --- /dev/null +++ b/modules/nf-core/wipertools/fastqwiper/tests/nextflow.config @@ -0,0 +1,5 @@ +process { + withName: WIPERTOOLS_FASTQWIPER { + ext.args = params.module_args + } +} diff --git a/modules/nf-core/wipertools/reportgather/environment.yml b/modules/nf-core/wipertools/reportgather/environment.yml new file mode 100644 index 0000000..2c9cfba --- /dev/null +++ b/modules/nf-core/wipertools/reportgather/environment.yml @@ -0,0 +1,7 @@ +--- +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json +channels: + - conda-forge + - bioconda +dependencies: + - "bioconda::wipertools=1.1.5" diff --git a/modules/nf-core/wipertools/reportgather/main.nf b/modules/nf-core/wipertools/reportgather/main.nf new file mode 100644 index 0000000..3a0b81b --- /dev/null +++ b/modules/nf-core/wipertools/reportgather/main.nf @@ -0,0 +1,58 @@ +process WIPERTOOLS_REPORTGATHER { + tag "$meta.id" + label 'process_single' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/wipertools:1.1.5--pyhdfd78af_0': + 'biocontainers/wipertools:1.1.5--pyhdfd78af_0' }" + + input: + tuple val(meta), path(report) + + output: + tuple val(meta), path("${prefix}.report"), emit: gathered_report + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + prefix = task.ext.prefix ?: "${meta.id}_gather" + + // Check if the output file name is in the list of input files + if (report.any { it.name == "${prefix}.report" }) { + error 'Output file name "${prefix}.report}" matches one of the input files. Use \"task.ext.prefix\" to disambiguate!.' + } + + """ + wipertools \\ + reportgather \\ + -r $report \\ + -f ${prefix}.report \\ + ${args} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + wipertools reportgather: \$(wipertools reportgather --version) + END_VERSIONS + """ + + stub: + prefix = task.ext.prefix ?: "${meta.id}_gather" + + // Check if the output file name is in the list of input files + if (report.any { it.name == "${prefix}.report" }) { + error 'Output file name "${prefix}.report}" matches one of the input files. Use \"task.ext.prefix\" to disambiguate!.' + } + + """ + touch ${prefix}.report + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + wipertools reportgather: \$(wipertools reportgather --version) + END_VERSIONS + """ +} diff --git a/modules/nf-core/wipertools/reportgather/meta.yml b/modules/nf-core/wipertools/reportgather/meta.yml new file mode 100644 index 0000000..eee8c89 --- /dev/null +++ b/modules/nf-core/wipertools/reportgather/meta.yml @@ -0,0 +1,55 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json +name: "wipertools_reportgather" +description: A tool of the wipertools suite that merges wiping reports generated by wipertools_fastqwiper +keywords: + - report + - merge + - union +tools: + - "reportgather": + description: "A tool of the wipertools suite that merges wiping reports generated by wipertools_fastqwiper." + homepage: "https://github.com/mazzalab/fastqwiper" + documentation: "https://github.com/mazzalab/fastqwiper" + tool_dev_url: "https://github.com/mazzalab/fastqwiper" + doi: "no DOI available" + licence: ["GPL v2-or-later"] + identifier: "" + args_id: "$args" + +input: + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test' ] + - report: + type: file + description: List of wiping reports to be merged + pattern: "*.report" + ontologies: + - edam: "http://edamontology.org/format_2330" +output: + - gathered_report: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test' ] + - "${prefix}.report": + type: file + description: The resulting report file + pattern: "*.report" + ontologies: + - edam: "http://edamontology.org/format_2330" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" + ontologies: + - edam: "http://edamontology.org/format_2330" +authors: + - "@tm4zza" +maintainers: + - "@mazzalab" + - "@tm4zza" diff --git a/modules/nf-core/wipertools/reportgather/tests/main.nf.test b/modules/nf-core/wipertools/reportgather/tests/main.nf.test new file mode 100644 index 0000000..732e789 --- /dev/null +++ b/modules/nf-core/wipertools/reportgather/tests/main.nf.test @@ -0,0 +1,88 @@ +nextflow_process { + + name "Test Process WIPERTOOLS_REPORTGATHER" + script "../main.nf" + process "WIPERTOOLS_REPORTGATHER" + + tag "modules" + tag "modules_nfcore" + tag "wipertools" + tag "wipertools/reportgather" + tag "wipertools/fastqwiper" + + setup { + run("WIPERTOOLS_FASTQWIPER", alias: "FQW1") { + script "../../fastqwiper/main.nf" + process { + """ + input[0] = [ + [ id:'test1' ], + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/fastq/test_quality_mismatch.fastq', checkIfExists: true) + ] + """ + } + } + + run("WIPERTOOLS_FASTQWIPER", alias: "FQW2") { + script "../../fastqwiper/main.nf" + process { + """ + input[0] = [ + [ id:'test2' ], + file(params.modules_testdata_base_path + 'genomics/homo_sapiens/illumina/fastq/test_truncated_clean.fastq', checkIfExists: true) + ] + """ + } + } + } + + test("merge two reports - .report") { + when { + params { + outdir = "$outputDir" + } + process { + """ + merged_reports = FQW1.out.report.mix(FQW2.out.report) + input[0] = merged_reports.map(it -> tuple ([id: 'reports'], it[1]) ) + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert path(process.out.gathered_report.get(0).get(1)).readLines().size() == 11}, + { + def report_header = path(process.out.gathered_report.get(0).get(1)).readLines()[0] + assert report_header.equals('FASTQWIPER REPORT:') + }, + { + def well_formed_lines = path(process.out.gathered_report.get(0).get(1)).readLines()[3] + assert well_formed_lines.equals('Well-formed lines: 8 (80.0%)') + }, + { assert snapshot(process.out).match() } + ) + } + } + + test("merge two reports - .report - stub") { + options "-stub" + + when { + process { + """ + merged_reports = FQW1.out.report.mix(FQW2.out.report) + input[0] = merged_reports.map(it -> tuple ([id: 'reports'], it[1]) ) + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } +} diff --git a/modules/nf-core/wipertools/reportgather/tests/main.nf.test.snap b/modules/nf-core/wipertools/reportgather/tests/main.nf.test.snap new file mode 100644 index 0000000..c0c91b9 --- /dev/null +++ b/modules/nf-core/wipertools/reportgather/tests/main.nf.test.snap @@ -0,0 +1,96 @@ +{ + "merge two reports - .report - stub": { + "content": [ + { + "0": [ + [ + { + "id": "reports" + }, + "reports_gather.report:md5,d41d8cd98f00b204e9800998ecf8427e" + ], + [ + { + "id": "reports" + }, + "reports_gather.report:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + "versions.yml:md5,6089d61842607d8760740fab8d1bfe02", + "versions.yml:md5,6089d61842607d8760740fab8d1bfe02" + ], + "gathered_report": [ + [ + { + "id": "reports" + }, + "reports_gather.report:md5,d41d8cd98f00b204e9800998ecf8427e" + ], + [ + { + "id": "reports" + }, + "reports_gather.report:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,6089d61842607d8760740fab8d1bfe02", + "versions.yml:md5,6089d61842607d8760740fab8d1bfe02" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.3" + }, + "timestamp": "2025-01-18T09:01:17.744913" + }, + "merge two reports - .report": { + "content": [ + { + "0": [ + [ + { + "id": "reports" + }, + "reports_gather.report:md5,3d2cfb8d7b72afe9b88663cf9dbf8129" + ], + [ + { + "id": "reports" + }, + "reports_gather.report:md5,d6fe1f0066f538993bb8f6b90150c8d5" + ] + ], + "1": [ + "versions.yml:md5,6089d61842607d8760740fab8d1bfe02", + "versions.yml:md5,6089d61842607d8760740fab8d1bfe02" + ], + "gathered_report": [ + [ + { + "id": "reports" + }, + "reports_gather.report:md5,3d2cfb8d7b72afe9b88663cf9dbf8129" + ], + [ + { + "id": "reports" + }, + "reports_gather.report:md5,d6fe1f0066f538993bb8f6b90150c8d5" + ] + ], + "versions": [ + "versions.yml:md5,6089d61842607d8760740fab8d1bfe02", + "versions.yml:md5,6089d61842607d8760740fab8d1bfe02" + ] + } + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.3" + }, + "timestamp": "2025-01-18T07:49:32.055316" + } +} diff --git a/nextflow.config b/nextflow.config index fe0fce3..3807503 100644 --- a/nextflow.config +++ b/nextflow.config @@ -9,68 +9,53 @@ // Global default params, used in configs params { - // TODO nf-core: Specify your pipeline's command line flags // Input options - input = null - - // MultiQC options - multiqc_config = null - multiqc_title = null - multiqc_logo = null - max_multiqc_email_size = '25.MB' - multiqc_methods_description = null + input = null + outdir = null // Boilerplate options - outdir = null + num_splits = 2 + qin = 33 + alphabet = "ACGTN" + skip_bbmap_repair = false publish_dir_mode = 'copy' + publish_all_tools = false + email = null email_on_fail = null plaintext_email = false monochrome_logs = false hook_url = null help = false + help_full = false + show_hidden = false version = false pipelines_testdata_base_path = 'https://raw.githubusercontent.com/nf-core/test-datasets/' + trace_report_suffix = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss')// Config options + config_profile_name = null + config_profile_description = null + + // MultiQC options + multiqc_config = null + multiqc_title = null + multiqc_logo = null + max_multiqc_email_size = '25.MB' + multiqc_methods_description = null + multiqc_replace_names = null + multiqc_sample_names = null - // Config options - config_profile_name = null - config_profile_description = null custom_config_version = 'master' custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" config_profile_contact = null config_profile_url = null - // Max resource options - // Defaults only, expecting to be overwritten - max_memory = '128.GB' - max_cpus = 16 - max_time = '240.h' - // Schema validation default options - validationFailUnrecognisedParams = false - validationLenientMode = false - validationSchemaIgnoreParams = 'genomes,igenomes_base' - validationShowHiddenParams = false - validate_params = true - + validate_params = true } // Load base.config by default for all pipelines includeConfig 'conf/base.config' -// Load nf-core custom profiles from different Institutions -try { - includeConfig "${params.custom_config_base}/nfcore_custom.config" -} catch (Exception e) { - System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") -} - -// Load nf-core/fastqrepair custom profiles from different institutions. -try { - includeConfig "${params.custom_config_base}/pipeline/fastqrepair.config" -} catch (Exception e) { - System.err.println("WARNING: Could not load nf-core/config/fastqrepair profiles: ${params.custom_config_base}/pipeline/fastqrepair.config") -} profiles { debug { dumpHashes = true @@ -85,7 +70,7 @@ profiles { podman.enabled = false shifter.enabled = false charliecloud.enabled = false - conda.channels = ['conda-forge', 'bioconda', 'defaults'] + conda.channels = ['conda-forge', 'bioconda'] apptainer.enabled = false } mamba { @@ -169,23 +154,35 @@ profiles { executor.name = 'local' executor.cpus = 4 executor.memory = 8.GB + process { + resourceLimits = [ + memory: 8.GB, + cpus : 4, + time : 1.h + ] + } } test { includeConfig 'conf/test.config' } test_full { includeConfig 'conf/test_full.config' } } -// Set default registry for Apptainer, Docker, Podman and Singularity independent of -profile -// Will not be used unless Apptainer / Docker / Podman / Singularity are enabled +// Load nf-core custom profiles from different Institutions +includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/nfcore_custom.config" : "/dev/null" + +// Load nf-core/fastqrepair custom profiles from different institutions. +// Enabled below a pipeline-specific nf-core config at https://github.com/nf-core/configs +includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/pipeline/fastqrepair.config" : "/dev/null" + +// Set default registry for Apptainer, Docker, Podman, Charliecloud and Singularity independent of -profile +// Will not be used unless Apptainer / Docker / Podman / Charliecloud / Singularity are enabled // Set to your registry if you have a mirror of containers -apptainer.registry = 'quay.io' -docker.registry = 'quay.io' -podman.registry = 'quay.io' -singularity.registry = 'quay.io' +apptainer.registry = 'quay.io' +docker.registry = 'quay.io' +podman.registry = 'quay.io' +singularity.registry = 'quay.io' +charliecloud.registry = 'quay.io' + -// Nextflow plugins -plugins { - id 'nf-validation@1.1.3' // Validation of pipeline parameters and creation of an input channel from a sample sheet -} // Export these variables to prevent local Python/R libraries from conflicting with those in the container // The JULIA depot path has been adjusted to a fixed path `/usr/local/share/julia` that needs to be used for packages in the container. @@ -198,73 +195,93 @@ env { JULIA_DEPOT_PATH = "/usr/local/share/julia" } -// Capture exit codes from upstream processes when piping -process.shell = ['/bin/bash', '-euo', 'pipefail'] +// Set bash options +process.shell = [ + "bash", + "-C", // No clobber - prevent output redirection from overwriting files. + "-e", // Exit if a tool returns a non-zero status/exit code + "-u", // Treat unset variables and parameters as an error + "-o", // Returns the status of the last command to exit.. + "pipefail" // ..with a non-zero status or zero if all successfully execute +] // Disable process selector warnings by default. Use debug profile to enable warnings. nextflow.enable.configProcessNamesValidation = false -def trace_timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss') timeline { enabled = true - file = "${params.outdir}/pipeline_info/execution_timeline_${trace_timestamp}.html" + file = "${params.outdir}/pipeline_info/execution_timeline_${params.trace_report_suffix}.html" } report { enabled = true - file = "${params.outdir}/pipeline_info/execution_report_${trace_timestamp}.html" + file = "${params.outdir}/pipeline_info/execution_report_${params.trace_report_suffix}.html" } trace { enabled = true - file = "${params.outdir}/pipeline_info/execution_trace_${trace_timestamp}.txt" + file = "${params.outdir}/pipeline_info/execution_trace_${params.trace_report_suffix}.txt" } dag { enabled = true - file = "${params.outdir}/pipeline_info/pipeline_dag_${trace_timestamp}.html" + file = "${params.outdir}/pipeline_info/pipeline_dag_${params.trace_report_suffix}.html" } manifest { name = 'nf-core/fastqrepair' - author = """Tommaso Mazza""" + author = """Tommaso Mazza""" // The author field is deprecated from Nextflow version 24.10.0, use contributors instead + contributors = [ + [ + name: 'Tommaso Mazza', + affiliation: 'IRCCS Casa Sollievo della Sofferenza', + email: 'mazza.tommaso@gmail.com', + github: 'https://github.com/tm4zza', + contribution: ['author', 'maintainer', 'contributor'], // List of contribution types ('author', 'maintainer' or 'contributor') + orcid: '0000-0003-0434-8533' + ], + ] homePage = 'https://github.com/nf-core/fastqrepair' - description = """A pipeline to recover corrupted FASTQ files, drop or fix pesky lines, remove unpaired reads, and settle reads interleaving""" + description = """A pipeline that can be used to recover corrupted FASTQ.gz files, drop or fix uncompliant reads, remove unpaired reads, and settle reads that became disordered""" mainScript = 'main.nf' - nextflowVersion = '!>=23.04.0' - version = '1.0dev' + defaultBranch = 'main' + nextflowVersion = '!>=24.04.2' + version = '1.0.0' doi = '' } -// Load modules.config for DSL2 module specific options -includeConfig 'conf/modules.config' +// Nextflow plugins +plugins { + id 'nf-schema@2.3.0' // Validation of pipeline parameters and creation of an input channel from a sample sheet +} -// Function to ensure that resource requirements don't go beyond -// a maximum limit -def check_max(obj, type) { - if (type == 'memory') { - try { - if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) - return params.max_memory as nextflow.util.MemoryUnit - else - return obj - } catch (all) { - println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'time') { - try { - if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1) - return params.max_time as nextflow.util.Duration - else - return obj - } catch (all) { - println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'cpus') { - try { - return Math.min( obj, params.max_cpus as int ) - } catch (all) { - println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj" - return obj - } +validation { + defaultIgnoreParams = ["genomes"] + help { + enabled = true + command = "nextflow run $manifest.name -profile --input samplesheet.csv --outdir " + fullParameter = "help_full" + showHiddenParameter = "show_hidden" + beforeText = """ +-\033[2m----------------------------------------------------\033[0m- + \033[0;32m,--.\033[0;30m/\033[0;32m,-.\033[0m +\033[0;34m ___ __ __ __ ___ \033[0;32m/,-._.--~\'\033[0m +\033[0;34m |\\ | |__ __ / ` / \\ |__) |__ \033[0;33m} {\033[0m +\033[0;34m | \\| | \\__, \\__/ | \\ |___ \033[0;32m\\`-._,-`-,\033[0m + \033[0;32m`._,._,\'\033[0m +\033[0;35m ${manifest.name} ${manifest.version}\033[0m +-\033[2m----------------------------------------------------\033[0m- +""" + afterText = """${manifest.doi ? "* The pipeline\n" : ""}${manifest.doi.tokenize(",").collect { " https://doi.org/${it.trim().replace('https://doi.org/','')}"}.join("\n")}${manifest.doi ? "\n" : ""} +* The nf-core framework + https://doi.org/10.1038/s41587-020-0439-x + +* Software dependencies + https://github.com/${manifest.name}/blob/master/CITATIONS.md +""" + } + summary { + beforeText = validation.help.beforeText + afterText = validation.help.afterText } } + +// Load modules.config for DSL2 module specific options +includeConfig 'conf/modules.config' diff --git a/nextflow_schema.json b/nextflow_schema.json index 5f19e62..eb2232a 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -1,19 +1,16 @@ { - "$schema": "http://json-schema.org/draft-07/schema", - "$id": "https://raw.githubusercontent.com/nf-core/fastqrepair/master/nextflow_schema.json", + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://raw.githubusercontent.com/nf-core/fastqrepair/main/nextflow_schema.json", "title": "nf-core/fastqrepair pipeline parameters", - "description": "A pipeline to recover corrupted FASTQ files, drop or fix pesky lines, remove unpaired reads, and settle reads interleaving", + "description": "A pipeline that can be used to recover corrupted FASTQ.gz files, drop or fix uncompliant reads, remove unpaired reads, and settle reads that became disordered", "type": "object", - "definitions": { + "$defs": { "input_output_options": { "title": "Input/output options", "type": "object", "fa_icon": "fas fa-terminal", "description": "Define where the pipeline should find input data and save output data.", - "required": [ - "input", - "outdir" - ], + "required": ["input", "outdir"], "properties": { "input": { "type": "string", @@ -29,20 +26,46 @@ "outdir": { "type": "string", "format": "directory-path", - "description": "The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.", + "description": "The output directory where the results will be saved.", + "help_text": "You have to use absolute paths to storage on Cloud infrastructure.", "fa_icon": "fas fa-folder-open" }, - "email": { - "type": "string", - "description": "Email address for completion summary.", - "fa_icon": "fas fa-envelope", - "help_text": "Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run.", - "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$" + "num_splits": { + "type": "integer", + "default": 2, + "pattern": "^[1-9]\\d*$", + "description": "FASTQ chunk numbers for parallel processing.", + "help_text": "The input fastq file will be split into this number of chunks to speedup wiping. The number must be > 0.", + "fa_icon": "fas fa-puzzle-piece" }, - "multiqc_title": { + "qin": { + "type": "integer", + "enum": [33, 64], + "default": 33, + "description": "the ASCII offset for BBMap (33=Sanger, 64=old Solexa).", + "fa_icon": "fas fa-puzzle-piece" + }, + "alphabet": { "type": "string", - "description": "MultiQC report title. Printed as page header, used for filename if not otherwise specified.", - "fa_icon": "fas fa-file-signature" + "default": "ACGTN", + "pattern": "^[^0-9]+$", + "description": "Allowed character in the SEQ line for Wipertools.", + "help_text": "SEQ lines containing characters not listed here will be discarded by Wipertools", + "fa_icon": "fas fa-solid fa-arrows-left-right-to-line" + }, + "skip_bbmap_repair": { + "type": "boolean", + "default": false, + "description": "Skip the BBMap re-pair step.", + "fa_icon": "fas fa-puzzle-piece" + }, + "publish_all_tools": { + "type": "boolean", + "default": false, + "description": "Publish intermediate results.", + "help_text": "This option tells the pipeline to publish all intermediate results in the output directory.", + "fa_icon": "fas fa-copy", + "hidden": false } } }, @@ -94,41 +117,6 @@ } } }, - "max_job_request_options": { - "title": "Max job request options", - "type": "object", - "fa_icon": "fab fa-acquisitions-incorporated", - "description": "Set the top limit for requested resources for any single job.", - "help_text": "If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.\n\nNote that you can not _increase_ the resources requested by any job using these options. For that you will need your own configuration file. See [the nf-core website](https://nf-co.re/usage/configuration) for details.", - "properties": { - "max_cpus": { - "type": "integer", - "description": "Maximum number of CPUs that can be requested for any single job.", - "default": 16, - "fa_icon": "fas fa-microchip", - "hidden": true, - "help_text": "Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`" - }, - "max_memory": { - "type": "string", - "description": "Maximum amount of memory that can be requested for any single job.", - "default": "128.GB", - "fa_icon": "fas fa-memory", - "pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$", - "hidden": true, - "help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`" - }, - "max_time": { - "type": "string", - "description": "Maximum amount of time that can be requested for any single job.", - "default": "240.h", - "fa_icon": "far fa-clock", - "pattern": "^(\\d+\\.?\\s*(s|m|h|d|day)\\s*)+$", - "hidden": true, - "help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`" - } - } - }, "generic_options": { "title": "Generic options", "type": "object", @@ -136,12 +124,6 @@ "description": "Less common options for the pipeline, typically set in a config file.", "help_text": "These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\n\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`.", "properties": { - "help": { - "type": "boolean", - "description": "Display help text.", - "fa_icon": "fas fa-question-circle", - "hidden": true - }, "version": { "type": "boolean", "description": "Display version and exit.", @@ -154,14 +136,7 @@ "description": "Method used to save pipeline results to output directory.", "help_text": "The Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See [Nextflow docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details.", "fa_icon": "fas fa-copy", - "enum": [ - "symlink", - "rellink", - "link", - "copy", - "copyNoFollow", - "move" - ], + "enum": ["symlink", "rellink", "link", "copy", "copyNoFollow", "move"], "hidden": true }, "email_on_fail": { @@ -178,6 +153,18 @@ "fa_icon": "fas fa-remove-format", "hidden": true }, + "email": { + "type": "string", + "description": "Email address for completion summary.", + "fa_icon": "fas fa-envelope", + "help_text": "Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run.", + "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$" + }, + "multiqc_title": { + "type": "string", + "description": "MultiQC report title. Printed as page header, used for filename if not otherwise specified.", + "fa_icon": "fas fa-file-signature" + }, "max_multiqc_email_size": { "type": "string", "description": "File size limit when attaching MultiQC reports to summary emails.", @@ -215,7 +202,18 @@ "multiqc_methods_description": { "type": "string", "description": "Custom MultiQC yaml file containing HTML including a methods description.", - "fa_icon": "fas fa-cog" + "fa_icon": "fas fa-cog", + "hidden": true + }, + "multiqc_replace_names": { + "type": "string", + "description": "Optional two-column sample renaming file. First column a set of patterns, second column a set of corresponding replacements. Passed via MultiQC's `--replace-names` option.", + "fa_icon": "far fa-file-code" + }, + "multiqc_sample_names": { + "type": "string", + "description": "Optional TSV file with headers, passed to the MultiQC --sample_names argument.", + "fa_icon": "far fa-file-code" }, "validate_params": { "type": "boolean", @@ -224,49 +222,31 @@ "fa_icon": "fas fa-check-square", "hidden": true }, - "validationShowHiddenParams": { - "type": "boolean", - "fa_icon": "far fa-eye-slash", - "description": "Show all params when using `--help`", - "hidden": true, - "help_text": "By default, parameters set as _hidden_ in the schema are not shown on the command line when a user runs with `--help`. Specifying this option will tell the pipeline to show all parameters." - }, - "validationFailUnrecognisedParams": { - "type": "boolean", - "fa_icon": "far fa-check-circle", - "description": "Validation of parameters fails when an unrecognised parameter is found.", - "hidden": true, - "help_text": "By default, when an unrecognised parameter is found, it returns a warinig." - }, - "validationLenientMode": { - "type": "boolean", - "fa_icon": "far fa-check-circle", - "description": "Validation of parameters in lenient more.", - "hidden": true, - "help_text": "Allows string values that are parseable as numbers or booleans. For further information see [JSONSchema docs](https://github.com/everit-org/json-schema#lenient-mode)." - }, "pipelines_testdata_base_path": { "type": "string", "fa_icon": "far fa-check-circle", "description": "Base URL or local path to location of pipeline test dataset files", "default": "https://raw.githubusercontent.com/nf-core/test-datasets/", "hidden": true + }, + "trace_report_suffix": { + "type": "string", + "fa_icon": "far calendar", + "description": "Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.", + "hidden": true } } } }, "allOf": [ { - "$ref": "#/definitions/input_output_options" - }, - { - "$ref": "#/definitions/institutional_config_options" + "$ref": "#/$defs/input_output_options" }, { - "$ref": "#/definitions/max_job_request_options" + "$ref": "#/$defs/institutional_config_options" }, { - "$ref": "#/definitions/generic_options" + "$ref": "#/$defs/generic_options" } ] -} \ No newline at end of file +} diff --git a/nf-test.config b/nf-test.config new file mode 100644 index 0000000..6489684 --- /dev/null +++ b/nf-test.config @@ -0,0 +1,12 @@ +config { + // Location of nf-tests + testsDir "./tests" + + // nf-test directory used to create temporary files for each test + workDir System.getenv("NFT_WORKDIR") ?: ".nf-test" + + // Location of an optional nextflow.config file specific for executing pipeline tests + configFile "tests/nextflow.config" + + profile "docker" +} diff --git a/ro-crate-metadata.json b/ro-crate-metadata.json new file mode 100644 index 0000000..a96cad1 --- /dev/null +++ b/ro-crate-metadata.json @@ -0,0 +1,308 @@ +{ + "@context": [ + "https://w3id.org/ro/crate/1.1/context", + { + "GithubService": "https://w3id.org/ro/terms/test#GithubService", + "JenkinsService": "https://w3id.org/ro/terms/test#JenkinsService", + "PlanemoEngine": "https://w3id.org/ro/terms/test#PlanemoEngine", + "TestDefinition": "https://w3id.org/ro/terms/test#TestDefinition", + "TestInstance": "https://w3id.org/ro/terms/test#TestInstance", + "TestService": "https://w3id.org/ro/terms/test#TestService", + "TestSuite": "https://w3id.org/ro/terms/test#TestSuite", + "TravisService": "https://w3id.org/ro/terms/test#TravisService", + "definition": "https://w3id.org/ro/terms/test#definition", + "engineVersion": "https://w3id.org/ro/terms/test#engineVersion", + "instance": "https://w3id.org/ro/terms/test#instance", + "resource": "https://w3id.org/ro/terms/test#resource", + "runsOn": "https://w3id.org/ro/terms/test#runsOn" + } + ], + "@graph": [ + { + "@id": "./", + "@type": "Dataset", + "creativeWorkStatus": "Stable", + "datePublished": "2025-01-27T14:46:50+00:00", + "description": "

\n \n \n \"nf-core/fastqrepair\"\n \n

\n\n[![GitHub Actions CI Status](https://github.com/nf-core/fastqrepair/actions/workflows/ci.yml/badge.svg)](https://github.com/nf-core/fastqrepair/actions/workflows/ci.yml)\n[![GitHub Actions Linting Status](https://github.com/nf-core/fastqrepair/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/fastqrepair/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/fastqrepair/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)\n[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)\n\n[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A524.04.2-23aa62.svg)](https://www.nextflow.io/)\n[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)\n[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)\n[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)\n[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/nf-core/fastqrepair)\n\n[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23fastqrepair-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/fastqrepair)[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/nf_core)[![Follow on Mastodon](https://img.shields.io/badge/mastodon-nf__core-6364ff?labelColor=FFFFFF&logo=mastodon)](https://mstdn.science/@nf_core)[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core)\n\n## Introduction\n\n**nf-core/fastqrepair** is a bioinformatics pipeline that ...\n\n\n\n\n1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))\n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.\n\n\n\nNow, you can run the pipeline using:\n\n\n\n```bash\nnextflow run nf-core/fastqrepair \\\n -profile \\\n --input samplesheet.csv \\\n --outdir \n```\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).\n\nFor more details and further functionality, please refer to the [usage documentation](https://nf-co.re/fastqrepair/usage) and the [parameter documentation](https://nf-co.re/fastqrepair/parameters).\n\n## Pipeline output\n\nTo see the results of an example test run with a full size dataset refer to the [results](https://nf-co.re/fastqrepair/results) tab on the nf-core website pipeline page.\nFor more details about the output files and reports, please refer to the\n[output documentation](https://nf-co.re/fastqrepair/output).\n\n## Credits\n\nnf-core/fastqrepair was originally written by Tommaso Mazza.\n\nWe thank the following people for their extensive assistance in the development of this pipeline:\n\n\n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).\n\nFor further information or help, don't hesitate to get in touch on the [Slack `#fastqrepair` channel](https://nfcore.slack.com/channels/fastqrepair) (you can join with [this invite](https://nf-co.re/join/slack)).\n\n## Citations\n\n\n\n\n\n\nAn extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.\n\nYou can cite the `nf-core` publication as follows:\n\n> **The nf-core framework for community-curated bioinformatics pipelines.**\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).\n", + "hasPart": [ + { + "@id": "main.nf" + }, + { + "@id": "assets/" + }, + { + "@id": "conf/" + }, + { + "@id": "docs/" + }, + { + "@id": "docs/images/" + }, + { + "@id": "modules/" + }, + { + "@id": "modules/nf-core/" + }, + { + "@id": "workflows/" + }, + { + "@id": "subworkflows/" + }, + { + "@id": "nextflow.config" + }, + { + "@id": "README.md" + }, + { + "@id": "nextflow_schema.json" + }, + { + "@id": "CHANGELOG.md" + }, + { + "@id": "LICENSE" + }, + { + "@id": "CODE_OF_CONDUCT.md" + }, + { + "@id": "CITATIONS.md" + }, + { + "@id": "modules.json" + }, + { + "@id": "docs/usage.md" + }, + { + "@id": "docs/output.md" + }, + { + "@id": ".nf-core.yml" + }, + { + "@id": ".pre-commit-config.yaml" + }, + { + "@id": ".prettierignore" + } + ], + "isBasedOn": "https://github.com/nf-core/fastqrepair", + "license": "MIT", + "mainEntity": { + "@id": "main.nf" + }, + "mentions": [ + { + "@id": "#634a2825-3e33-4286-8cfb-628125724925" + } + ], + "name": "nf-core/fastqrepair" + }, + { + "@id": "ro-crate-metadata.json", + "@type": "CreativeWork", + "about": { + "@id": "./" + }, + "conformsTo": [ + { + "@id": "https://w3id.org/ro/crate/1.1" + }, + { + "@id": "https://w3id.org/workflowhub/workflow-ro-crate/1.0" + } + ] + }, + { + "@id": "main.nf", + "@type": ["File", "SoftwareSourceCode", "ComputationalWorkflow"], + "dateCreated": "", + "dateModified": "2025-01-27T14:46:50Z", + "dct:conformsTo": "https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE/", + "keywords": [ + "nf-core", + "nextflow", + "corruption", + "data-cleaning", + "data-recovery", + "fastq", + "fastq-corrupted", + "fastq-format", + "reads-interleaving", + "recovery-tool", + "unpaired-reads", + "well-formed" + ], + "license": ["MIT"], + "name": ["nf-core/fastqrepair"], + "programmingLanguage": { + "@id": "https://w3id.org/workflowhub/workflow-ro-crate#nextflow" + }, + "sdPublisher": { + "@id": "https://nf-co.re/" + }, + "url": ["https://github.com/nf-core/fastqrepair", "https://nf-co.re/fastqrepair/1.0.0/"], + "version": ["1.0.0"] + }, + { + "@id": "https://w3id.org/workflowhub/workflow-ro-crate#nextflow", + "@type": "ComputerLanguage", + "identifier": { + "@id": "https://www.nextflow.io/" + }, + "name": "Nextflow", + "url": { + "@id": "https://www.nextflow.io/" + }, + "version": "!>=24.04.2" + }, + { + "@id": "#634a2825-3e33-4286-8cfb-628125724925", + "@type": "TestSuite", + "instance": [ + { + "@id": "#3d1b71a6-acbd-420b-a915-8dfc6840f2fc" + } + ], + "mainEntity": { + "@id": "main.nf" + }, + "name": "Test suite for nf-core/fastqrepair" + }, + { + "@id": "#3d1b71a6-acbd-420b-a915-8dfc6840f2fc", + "@type": "TestInstance", + "name": "GitHub Actions workflow for testing nf-core/fastqrepair", + "resource": "repos/nf-core/fastqrepair/actions/workflows/ci.yml", + "runsOn": { + "@id": "https://w3id.org/ro/terms/test#GithubService" + }, + "url": "https://api.github.com" + }, + { + "@id": "https://w3id.org/ro/terms/test#GithubService", + "@type": "TestService", + "name": "Github Actions", + "url": { + "@id": "https://github.com" + } + }, + { + "@id": "assets/", + "@type": "Dataset", + "description": "Additional files" + }, + { + "@id": "conf/", + "@type": "Dataset", + "description": "Configuration files" + }, + { + "@id": "docs/", + "@type": "Dataset", + "description": "Markdown files for documenting the pipeline" + }, + { + "@id": "docs/images/", + "@type": "Dataset", + "description": "Images for the documentation files" + }, + { + "@id": "modules/", + "@type": "Dataset", + "description": "Modules used by the pipeline" + }, + { + "@id": "modules/nf-core/", + "@type": "Dataset", + "description": "nf-core modules" + }, + { + "@id": "workflows/", + "@type": "Dataset", + "description": "Main pipeline workflows to be executed in main.nf" + }, + { + "@id": "subworkflows/", + "@type": "Dataset", + "description": "Smaller subworkflows" + }, + { + "@id": "nextflow.config", + "@type": "File", + "description": "Main Nextflow configuration file" + }, + { + "@id": "README.md", + "@type": "File", + "description": "Basic pipeline usage information" + }, + { + "@id": "nextflow_schema.json", + "@type": "File", + "description": "JSON schema for pipeline parameter specification" + }, + { + "@id": "CHANGELOG.md", + "@type": "File", + "description": "Information on changes made to the pipeline" + }, + { + "@id": "LICENSE", + "@type": "File", + "description": "The license - should be MIT" + }, + { + "@id": "CODE_OF_CONDUCT.md", + "@type": "File", + "description": "The nf-core code of conduct" + }, + { + "@id": "CITATIONS.md", + "@type": "File", + "description": "Citations needed when using the pipeline" + }, + { + "@id": "modules.json", + "@type": "File", + "description": "Version information for modules from nf-core/modules" + }, + { + "@id": "docs/usage.md", + "@type": "File", + "description": "Usage documentation" + }, + { + "@id": "docs/output.md", + "@type": "File", + "description": "Output documentation" + }, + { + "@id": ".nf-core.yml", + "@type": "File", + "description": "nf-core configuration file, configuring template features and linting rules" + }, + { + "@id": ".pre-commit-config.yaml", + "@type": "File", + "description": "Configuration file for pre-commit hooks" + }, + { + "@id": ".prettierignore", + "@type": "File", + "description": "Ignore file for prettier" + }, + { + "@id": "https://nf-co.re/", + "@type": "Organization", + "name": "nf-core", + "url": "https://nf-co.re/" + } + ] +} diff --git a/subworkflows/local/fastq_repair_wipertools/main.nf b/subworkflows/local/fastq_repair_wipertools/main.nf new file mode 100644 index 0000000..42a6d43 --- /dev/null +++ b/subworkflows/local/fastq_repair_wipertools/main.nf @@ -0,0 +1,83 @@ +include { WIPERTOOLS_FASTQWIPER } from '../../../modules/nf-core/wipertools/fastqwiper' +include { WIPERTOOLS_FASTQSCATTER } from '../../../modules/nf-core/wipertools/fastqscatter' +include { WIPERTOOLS_FASTQGATHER } from '../../../modules/nf-core/wipertools/fastqgather' +include { WIPERTOOLS_REPORTGATHER } from '../../../modules/nf-core/wipertools/reportgather' + + +workflow FASTQ_REPAIR_WIPERTOOLS { + take: + ch_fastq // channel: [ val(meta), [ .fastq ] ] + + main: + ch_versions = Channel.empty() + + // decouple fastq files [sample_id = id; id is the file name] + ch_decoupled = ch_fastq.flatMap { + meta, fastqList -> fastqList instanceof List + ? fastqList.collect { fastq -> tuple("id": (fastq.name.endsWith(".gz") ? fastq.getBaseName(2): fastq.baseName), "sample_id":meta.id, "single_end": meta.single_end, fastq) } + : [tuple("id": (fastqList.name.endsWith(".gz") ? fastqList.getBaseName(2): fastqList.baseName), "sample_id":meta.id, "single_end": meta.single_end, fastqList)] + } + + // Split fastq files into chunks + WIPERTOOLS_FASTQSCATTER( + ch_decoupled, + params.num_splits + ) + ch_versions = ch_versions.mix(WIPERTOOLS_FASTQSCATTER.out.versions.first()) + + /* decouple chunks + id = the chunk file name + mate_id = (previous) id, i.e., the original, not split, file name + sample_id = the id of the original meta + single_end = the single_end status of the original meta + */ + ch_scattered_fastq = WIPERTOOLS_FASTQSCATTER.out.fastq_chunks.flatMap { + meta, fastqList -> fastqList.collect { + fastq -> tuple("id": (fastq.name.endsWith(".gz") ? fastq.getBaseName(2) : fastq.baseName), + "mate_id":meta.id, "sample_id":meta.sample_id, "single_end": meta.single_end, fastq) } } + + // Wipe fastq files + WIPERTOOLS_FASTQWIPER { + ch_scattered_fastq + } + ch_versions = ch_versions.mix(WIPERTOOLS_FASTQWIPER.out.versions.first()) + + // group wiped chunks + ch_cleaned_fastq = Channel.empty() + WIPERTOOLS_FASTQWIPER.out.wiped_fastq + | map { meta, fq -> [meta.subMap('mate_id', 'sample_id', 'single_end'), fq]} + | groupTuple + | map { meta, fq -> [['id':meta.mate_id, 'sample_id':meta.sample_id, 'single_end':meta.single_end], fq]} + | set { ch_cleaned_fastq } + + // gather fastq files + WIPERTOOLS_FASTQGATHER( + ch_cleaned_fastq + ) + ch_versions = ch_versions.mix(WIPERTOOLS_FASTQGATHER.out.versions.first()) + + // group wiping reports + ch_gathered_report = Channel.empty() + WIPERTOOLS_FASTQWIPER.out.report + | map { meta, report -> [meta.subMap('mate_id', 'sample_id', 'single_end'), report]} + | groupTuple + | map { meta, report -> [['id':meta.mate_id, 'sample_id':meta.sample_id, 'single_end':meta.single_end], report]} + | set { ch_gathered_report } + + // gather report files and replace meta.id with meta.sample_id + ch_final_reports = Channel.empty() + WIPERTOOLS_REPORTGATHER( + ch_gathered_report + ) + ch_versions = ch_versions.mix(WIPERTOOLS_REPORTGATHER.out.versions.first()) + + WIPERTOOLS_REPORTGATHER.out.gathered_report + | map { meta, report -> [['id':meta.sample_id, 'single_end':meta.single_end], report]} + | set { ch_final_reports } + + + emit: + wiped_fastq = WIPERTOOLS_FASTQGATHER.out.gathered_fastq // channel: [ val(meta), [ .fastq|.fastq.gz ] ] + report = ch_final_reports // channel: [ val(meta), [ .txt ] ] + versions = ch_versions // channel: [ versions.yml ] +} diff --git a/subworkflows/local/fastq_repair_wipertools/meta.yml b/subworkflows/local/fastq_repair_wipertools/meta.yml new file mode 100644 index 0000000..01b1b1c --- /dev/null +++ b/subworkflows/local/fastq_repair_wipertools/meta.yml @@ -0,0 +1,43 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json +name: "fastq_repair_wipertools" +description: Apply Wipertools to readable FASTQ files +keywords: + - malformed + - fastq + - clean + - well-formed +components: + - wipertools/fastqwiper + - wipertools/fastqscatter + - wipertools/fastqgather + - wipertools/reportgather +input: + - ch_fastq: + type: file + description: | + The input channel containing the FASTQ files + Structure: [ val(meta), path(fastq) ] + pattern: "*.{fastq.gz/fastq/fq.gz/fq}" +output: + - wiped_fastq: + type: file + description: | + Channel containing well-formed FASTQ files + Structure: [ val(meta), path(fastq) ] + pattern: "*.{fastq.gz/fastq/fq.gz/fq}" + - report: + type: file + description: | + Channel containing the quality report of fastqwiper + Structure: [ val(meta), path(report) ] + pattern: "*.report" + - versions: + type: file + description: | + File containing software versions + Structure: [ path(versions.yml) ] + pattern: "versions.yml" +authors: + - "@mazzalab" +maintainers: + - "@mazzalab" diff --git a/subworkflows/local/fastq_repair_wipertools/tests/main.nf.test b/subworkflows/local/fastq_repair_wipertools/tests/main.nf.test new file mode 100644 index 0000000..12d4ba9 --- /dev/null +++ b/subworkflows/local/fastq_repair_wipertools/tests/main.nf.test @@ -0,0 +1,39 @@ +// Once you have added the required tests, please run the following command to build this file: +// nf-core subworkflows test fastq_repair_wipertools +nextflow_workflow { + + name "Test Subworkflow FASTQ_REPAIR_WIPERTOOLS" + script "../main.nf" + workflow "FASTQ_REPAIR_WIPERTOOLS" + + tag "subworkflows" + tag "subworkflows_" + tag "subworkflows/fastq_repair_wipertools" + tag "wipertools" + tag "wipertools/fastqwiper" + tag "wipertools/fastqscatter" + tag "wipertools/fastqgather" + tag "wipertools/reportgather" + + test("sarscov2_illumina_paired - fastq_gz") { + + when { + workflow { + """ + input[0] = [ + [ id:'test', single_end:false ], // meta map + [file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)] + ] + """ + } + } + + then { + assertAll( + { assert workflow.success}, + { assert snapshot(workflow.out).match()} + ) + } + } +} diff --git a/subworkflows/local/utils_nfcore_fastqrepair_pipeline/main.nf b/subworkflows/local/utils_nfcore_fastqrepair_pipeline/main.nf index 8c32f08..08a831c 100644 --- a/subworkflows/local/utils_nfcore_fastqrepair_pipeline/main.nf +++ b/subworkflows/local/utils_nfcore_fastqrepair_pipeline/main.nf @@ -8,29 +8,25 @@ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -include { UTILS_NFVALIDATION_PLUGIN } from '../../nf-core/utils_nfvalidation_plugin' -include { paramsSummaryMap } from 'plugin/nf-validation' -include { fromSamplesheet } from 'plugin/nf-validation' -include { UTILS_NEXTFLOW_PIPELINE } from '../../nf-core/utils_nextflow_pipeline' +include { UTILS_NFSCHEMA_PLUGIN } from '../../nf-core/utils_nfschema_plugin' +include { paramsSummaryMap } from 'plugin/nf-schema' +include { samplesheetToList } from 'plugin/nf-schema' include { completionEmail } from '../../nf-core/utils_nfcore_pipeline' include { completionSummary } from '../../nf-core/utils_nfcore_pipeline' -include { dashedLine } from '../../nf-core/utils_nfcore_pipeline' -include { nfCoreLogo } from '../../nf-core/utils_nfcore_pipeline' include { imNotification } from '../../nf-core/utils_nfcore_pipeline' include { UTILS_NFCORE_PIPELINE } from '../../nf-core/utils_nfcore_pipeline' -include { workflowCitation } from '../../nf-core/utils_nfcore_pipeline' +include { UTILS_NEXTFLOW_PIPELINE } from '../../nf-core/utils_nextflow_pipeline' /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW TO INITIALISE PIPELINE -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow PIPELINE_INITIALISATION { take: version // boolean: Display version and exit - help // boolean: Display help text validate_params // boolean: Boolean whether to validate parameters against the schema at runtime monochrome_logs // boolean: Do not use coloured log outputs nextflow_cli_args // array: List of positional nextflow CLI args @@ -38,7 +34,6 @@ workflow PIPELINE_INITIALISATION { input // string: Path to input samplesheet main: - ch_versions = Channel.empty() // @@ -54,16 +49,10 @@ workflow PIPELINE_INITIALISATION { // // Validate parameters and generate parameter summary to stdout // - pre_help_text = nfCoreLogo(monochrome_logs) - post_help_text = '\n' + workflowCitation() + '\n' + dashedLine(monochrome_logs) - def String workflow_command = "nextflow run ${workflow.manifest.name} -profile --input samplesheet.csv --outdir " - UTILS_NFVALIDATION_PLUGIN ( - help, - workflow_command, - pre_help_text, - post_help_text, + UTILS_NFSCHEMA_PLUGIN ( + workflow, validate_params, - "nextflow_schema.json" + null ) // @@ -76,8 +65,9 @@ workflow PIPELINE_INITIALISATION { // // Create channel from input file provided through params.input // + Channel - .fromSamplesheet("input") + .fromList(samplesheetToList(params.input, "${projectDir}/assets/schema_input.json")) .map { meta, fastq_1, fastq_2 -> if (!fastq_2) { @@ -87,8 +77,8 @@ workflow PIPELINE_INITIALISATION { } } .groupTuple() - .map { - validateInputSamplesheet(it) + .map { samplesheet -> + validateInputSamplesheet(samplesheet) } .map { meta, fastqs -> @@ -96,15 +86,22 @@ workflow PIPELINE_INITIALISATION { } .set { ch_samplesheet } + // + // Check that paired-end fastq files have the same file extensions + // + validateConcordantExtensionsInSamplesheet(ch_samplesheet) + emit: samplesheet = ch_samplesheet versions = ch_versions } + + /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW FOR PIPELINE COMPLETION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow PIPELINE_COMPLETION { @@ -119,19 +116,26 @@ workflow PIPELINE_COMPLETION { multiqc_report // string: Path to MultiQC report main: - summary_params = paramsSummaryMap(workflow, parameters_schema: "nextflow_schema.json") + def multiqc_reports = multiqc_report.toList() // // Completion email and summary // workflow.onComplete { if (email || email_on_fail) { - completionEmail(summary_params, email, email_on_fail, plaintext_email, outdir, monochrome_logs, multiqc_report.toList()) + completionEmail( + summary_params, + email, + email_on_fail, + plaintext_email, + outdir, + monochrome_logs, + multiqc_reports.getVal(), + ) } completionSummary(monochrome_logs) - if (hook_url) { imNotification(summary_params, hook_url) } @@ -143,10 +147,13 @@ workflow PIPELINE_COMPLETION { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ +import java.util.zip.GZIPInputStream +import java.nio.file.Path +import java.nio.file.Files // // Validate channels from input samplesheet @@ -155,7 +162,7 @@ def validateInputSamplesheet(input) { def (metas, fastqs) = input[1..2] // Check that multiple runs of the same sample are of the same datatype i.e. single-end / paired-end - def endedness_ok = metas.collect{ it.single_end }.unique().size == 1 + def endedness_ok = metas.collect{ meta -> meta.single_end }.unique().size == 1 if (!endedness_ok) { error("Please check input samplesheet -> Multiple runs of a sample must be of the same datatype i.e. single-end or paired-end: ${metas[0].id}") } @@ -163,37 +170,62 @@ def validateInputSamplesheet(input) { return [ metas[0], fastqs ] } +// +// Same fastq files are not allowed to be analyzed multiple times in the same run +// +def validateConcordantExtensionsInSamplesheet(samplesheet) { + // Collect the sample IDs of problematic entries + def wrong_samples = samplesheet.filter { meta, fastq -> !meta.single_end && fastq[0] && fastq[1] && fastq[0].extension != fastq[1].extension } + wrong_samples.subscribe { meta, fastq -> error("Please check input samplesheet -> Paired-end fastq files must have the same extension for ${meta.id}") } +} + +// +// Check if a fastq file is empty +// +def isFastqFileEmpty(Path fastqFilePath) { + if (fastqFilePath.toString().endsWith('.gz')) { + try (GZIPInputStream gis = new GZIPInputStream(Files.newInputStream(fastqFilePath))) { + return gis.read() == -1 + } catch (IOException e) { + throw new RuntimeException("Error reading gzipped file: ${fastqFilePath}", e) + } + } else { + return Files.size(fastqFilePath) == 0 || Files.size(fastqFilePath) == -1 + } +} + // // Generate methods description for MultiQC // def toolCitationText() { - // TODO nf-core: Optionally add in-text citation tools to this list. // Can use ternary operators to dynamically construct based conditions, e.g. params["run_xyz"] ? "Tool (Foo et al. 2023)" : "", - // Uncomment function in methodsDescriptionText to render in MultiQC report def citation_text = [ "Tools used in the workflow included:", - "FastQC (Andrews 2010),", - "MultiQC (Ewels et al. 2016)", - "." + "The gzip Recovery Toolkit,", + "FastqWiper,", + "BBMap,", + "FastQC (Andrews 2010)", + "MultiQC (Ewels et al. 2016)", ].join(' ').trim() return citation_text } def toolBibliographyText() { - // TODO nf-core: Optionally add bibliographic entries to this list. // Can use ternary operators to dynamically construct based conditions, e.g. params["run_xyz"] ? "
  • Author (2023) Pub name, Journal, DOI
  • " : "", - // Uncomment function in methodsDescriptionText to render in MultiQC report def reference_text = [ - "
  • Andrews S, (2010) FastQC, URL: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
  • ", - "
  • Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. doi: /10.1093/bioinformatics/btw354
  • " + "
  • Renn, A. M. (2013). The gzip Recovery Toolkit [Online]
  • ", + "
  • MazzaLab, Tommaso Mazza, & Jose F Oviedo. (2025). mazzalab/fastqwiper: v1.1.5 (v1.1.5). Zenodo, https://doi.org/10.5281/zenodo.14774039
  • ", + "
  • Bushnell B. (2024). BBMap [Online]
  • ", + "
  • Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]
  • ", + "
  • Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924
  • " ].join(' ').trim() return reference_text } def methodsDescriptionText(mqc_methods_yaml) { - // Convert to a named map so can be used as with familar NXF ${workflow} variable syntax in the MultiQC YML file + // Convert to a named map so can be used as with familiar NXF ${workflow} variable syntax in the MultiQC YML file def meta = [:] meta.workflow = workflow.toMap() meta["manifest_map"] = workflow.manifest.toMap() @@ -204,8 +236,10 @@ def methodsDescriptionText(mqc_methods_yaml) { // Removing `https://doi.org/` to handle pipelines using DOIs vs DOI resolvers // Removing ` ` since the manifest.doi is a string and not a proper list def temp_doi_ref = "" - String[] manifest_doi = meta.manifest_map.doi.tokenize(",") - for (String doi_ref: manifest_doi) temp_doi_ref += "(doi: ${doi_ref.replace("https://doi.org/", "").replace(" ", "")}), " + def manifest_doi = meta.manifest_map.doi.tokenize(",") + manifest_doi.each { doi_ref -> + temp_doi_ref += "(doi: ${doi_ref.replace("https://doi.org/", "").replace(" ", "")}), " + } meta["doi_text"] = temp_doi_ref.substring(0, temp_doi_ref.length() - 2) } else meta["doi_text"] = "" meta["nodoi_text"] = meta.manifest_map.doi ? "" : "
  • If available, make sure to update the text to include the Zenodo DOI of version of the pipeline used.
  • " @@ -214,9 +248,9 @@ def methodsDescriptionText(mqc_methods_yaml) { meta["tool_citations"] = "" meta["tool_bibliography"] = "" - // TODO nf-core: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled! - // meta["tool_citations"] = toolCitationText().replaceAll(", \\.", ".").replaceAll("\\. \\.", ".").replaceAll(", \\.", ".") - // meta["tool_bibliography"] = toolBibliographyText() + // Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled! + meta["tool_citations"] = toolCitationText().replaceAll(", \\.", ".").replaceAll("\\. \\.", ".").replaceAll(", \\.", ".") + meta["tool_bibliography"] = toolBibliographyText() def methods_text = mqc_methods_yaml.text diff --git a/subworkflows/nf-core/utils_nextflow_pipeline/main.nf b/subworkflows/nf-core/utils_nextflow_pipeline/main.nf index ac31f28..d6e593e 100644 --- a/subworkflows/nf-core/utils_nextflow_pipeline/main.nf +++ b/subworkflows/nf-core/utils_nextflow_pipeline/main.nf @@ -2,18 +2,13 @@ // Subworkflow with functionality that may be useful for any Nextflow pipeline // -import org.yaml.snakeyaml.Yaml -import groovy.json.JsonOutput -import nextflow.extension.FilesEx - /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW DEFINITION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow UTILS_NEXTFLOW_PIPELINE { - take: print_version // boolean: print version dump_parameters // boolean: dump parameters @@ -26,7 +21,7 @@ workflow UTILS_NEXTFLOW_PIPELINE { // Print workflow version and exit on --version // if (print_version) { - log.info "${workflow.manifest.name} ${getWorkflowVersion()}" + log.info("${workflow.manifest.name} ${getWorkflowVersion()}") System.exit(0) } @@ -49,16 +44,16 @@ workflow UTILS_NEXTFLOW_PIPELINE { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // // Generate version string // def getWorkflowVersion() { - String version_string = "" + def version_string = "" as String if (workflow.manifest.version) { def prefix_v = workflow.manifest.version[0] != 'v' ? 'v' : '' version_string += "${prefix_v}${workflow.manifest.version}" @@ -76,13 +71,13 @@ def getWorkflowVersion() { // Dump pipeline parameters to a JSON file // def dumpParametersToJSON(outdir) { - def timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss') - def filename = "params_${timestamp}.json" - def temp_pf = new File(workflow.launchDir.toString(), ".${filename}") - def jsonStr = JsonOutput.toJson(params) - temp_pf.text = JsonOutput.prettyPrint(jsonStr) + def timestamp = new java.util.Date().format('yyyy-MM-dd_HH-mm-ss') + def filename = "params_${timestamp}.json" + def temp_pf = new File(workflow.launchDir.toString(), ".${filename}") + def jsonStr = groovy.json.JsonOutput.toJson(params) + temp_pf.text = groovy.json.JsonOutput.prettyPrint(jsonStr) - FilesEx.copyTo(temp_pf.toPath(), "${outdir}/pipeline_info/params_${timestamp}.json") + nextflow.extension.FilesEx.copyTo(temp_pf.toPath(), "${outdir}/pipeline_info/params_${timestamp}.json") temp_pf.delete() } @@ -90,37 +85,42 @@ def dumpParametersToJSON(outdir) { // When running with -profile conda, warn if channels have not been set-up appropriately // def checkCondaChannels() { - Yaml parser = new Yaml() + def parser = new org.yaml.snakeyaml.Yaml() def channels = [] try { def config = parser.load("conda config --show channels".execute().text) channels = config.channels - } catch(NullPointerException | IOException e) { - log.warn "Could not verify conda channel configuration." - return + } + catch (NullPointerException e) { + log.debug(e) + log.warn("Could not verify conda channel configuration.") + return null + } + catch (IOException e) { + log.debug(e) + log.warn("Could not verify conda channel configuration.") + return null } // Check that all channels are present // This channel list is ordered by required channel priority. - def required_channels_in_order = ['conda-forge', 'bioconda', 'defaults'] + def required_channels_in_order = ['conda-forge', 'bioconda'] def channels_missing = ((required_channels_in_order as Set) - (channels as Set)) as Boolean // Check that they are in the right order - def channel_priority_violation = false - def n = required_channels_in_order.size() - for (int i = 0; i < n - 1; i++) { - channel_priority_violation |= !(channels.indexOf(required_channels_in_order[i]) < channels.indexOf(required_channels_in_order[i+1])) - } + def channel_priority_violation = required_channels_in_order != channels.findAll { ch -> ch in required_channels_in_order } if (channels_missing | channel_priority_violation) { - log.warn "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" + - " There is a problem with your Conda configuration!\n\n" + - " You will need to set-up the conda-forge and bioconda channels correctly.\n" + - " Please refer to https://bioconda.github.io/\n" + - " The observed channel order is \n" + - " ${channels}\n" + - " but the following channel order is required:\n" + - " ${required_channels_in_order}\n" + - "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" + log.warn """\ + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + There is a problem with your Conda configuration! + You will need to set-up the conda-forge and bioconda channels correctly. + Please refer to https://bioconda.github.io/ + The observed channel order is + ${channels} + but the following channel order is required: + ${required_channels_in_order} + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" + """.stripIndent(true) } } diff --git a/subworkflows/nf-core/utils_nextflow_pipeline/tests/main.workflow.nf.test b/subworkflows/nf-core/utils_nextflow_pipeline/tests/main.workflow.nf.test index ca964ce..02dbf09 100644 --- a/subworkflows/nf-core/utils_nextflow_pipeline/tests/main.workflow.nf.test +++ b/subworkflows/nf-core/utils_nextflow_pipeline/tests/main.workflow.nf.test @@ -52,10 +52,12 @@ nextflow_workflow { } then { - assertAll( - { assert workflow.success }, - { assert workflow.stdout.contains("nextflow_workflow v9.9.9") } - ) + expect { + with(workflow) { + assert success + assert "nextflow_workflow v9.9.9" in stdout + } + } } } diff --git a/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config b/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config index d0a926b..a09572e 100644 --- a/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config +++ b/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config @@ -3,7 +3,7 @@ manifest { author = """nf-core""" homePage = 'https://127.0.0.1' description = """Dummy pipeline""" - nextflowVersion = '!>=23.04.0' + nextflowVersion = '!>=23.04.0' version = '9.9.9' doi = 'https://doi.org/10.5281/zenodo.5070524' } diff --git a/subworkflows/nf-core/utils_nfcore_pipeline/main.nf b/subworkflows/nf-core/utils_nfcore_pipeline/main.nf index 14558c3..bfd2587 100644 --- a/subworkflows/nf-core/utils_nfcore_pipeline/main.nf +++ b/subworkflows/nf-core/utils_nfcore_pipeline/main.nf @@ -2,17 +2,13 @@ // Subworkflow with utility functions specific to the nf-core pipeline template // -import org.yaml.snakeyaml.Yaml -import nextflow.extension.FilesEx - /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW DEFINITION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow UTILS_NFCORE_PIPELINE { - take: nextflow_cli_args @@ -25,23 +21,20 @@ workflow UTILS_NFCORE_PIPELINE { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // // Warn if a -profile or Nextflow config has not been provided to run the pipeline // def checkConfigProvided() { - valid_config = true + def valid_config = true as Boolean if (workflow.profile == 'standard' && workflow.configFiles.size() <= 1) { - log.warn "[$workflow.manifest.name] You are attempting to run the pipeline without any custom configuration!\n\n" + - "This will be dependent on your local compute environment but can be achieved via one or more of the following:\n" + - " (1) Using an existing pipeline profile e.g. `-profile docker` or `-profile singularity`\n" + - " (2) Using an existing nf-core/configs for your Institution e.g. `-profile crick` or `-profile uppmax`\n" + - " (3) Using your own local custom config e.g. `-c /path/to/your/custom.config`\n\n" + - "Please refer to the quick start section and usage docs for the pipeline.\n " + log.warn( + "[${workflow.manifest.name}] You are attempting to run the pipeline without any custom configuration!\n\n" + "This will be dependent on your local compute environment but can be achieved via one or more of the following:\n" + " (1) Using an existing pipeline profile e.g. `-profile docker` or `-profile singularity`\n" + " (2) Using an existing nf-core/configs for your Institution e.g. `-profile crick` or `-profile uppmax`\n" + " (3) Using your own local custom config e.g. `-c /path/to/your/custom.config`\n\n" + "Please refer to the quick start section and usage docs for the pipeline.\n " + ) valid_config = false } return valid_config @@ -52,39 +45,22 @@ def checkConfigProvided() { // def checkProfileProvided(nextflow_cli_args) { if (workflow.profile.endsWith(',')) { - error "The `-profile` option cannot end with a trailing comma, please remove it and re-run the pipeline!\n" + - "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + error( + "The `-profile` option cannot end with a trailing comma, please remove it and re-run the pipeline!\n" + "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + ) } if (nextflow_cli_args[0]) { - log.warn "nf-core pipelines do not accept positional arguments. The positional argument `${nextflow_cli_args[0]}` has been detected.\n" + - "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + log.warn( + "nf-core pipelines do not accept positional arguments. The positional argument `${nextflow_cli_args[0]}` has been detected.\n" + "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + ) } } -// -// Citation string for pipeline -// -def workflowCitation() { - def temp_doi_ref = "" - String[] manifest_doi = workflow.manifest.doi.tokenize(",") - // Using a loop to handle multiple DOIs - // Removing `https://doi.org/` to handle pipelines using DOIs vs DOI resolvers - // Removing ` ` since the manifest.doi is a string and not a proper list - for (String doi_ref: manifest_doi) temp_doi_ref += " https://doi.org/${doi_ref.replace('https://doi.org/', '').replace(' ', '')}\n" - return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" + - "* The pipeline\n" + - temp_doi_ref + "\n" + - "* The nf-core framework\n" + - " https://doi.org/10.1038/s41587-020-0439-x\n\n" + - "* Software dependencies\n" + - " https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md" -} - // // Generate workflow version string // def getWorkflowVersion() { - String version_string = "" + def version_string = "" as String if (workflow.manifest.version) { def prefix_v = workflow.manifest.version[0] != 'v' ? 'v' : '' version_string += "${prefix_v}${workflow.manifest.version}" @@ -102,8 +78,8 @@ def getWorkflowVersion() { // Get software versions for pipeline // def processVersionsFromYAML(yaml_file) { - Yaml yaml = new Yaml() - versions = yaml.load(yaml_file).collectEntries { k, v -> [ k.tokenize(':')[-1], v ] } + def yaml = new org.yaml.snakeyaml.Yaml() + def versions = yaml.load(yaml_file).collectEntries { k, v -> [k.tokenize(':')[-1], v] } return yaml.dumpAsMap(versions).trim() } @@ -113,8 +89,8 @@ def processVersionsFromYAML(yaml_file) { def workflowVersionToYAML() { return """ Workflow: - $workflow.manifest.name: ${getWorkflowVersion()} - Nextflow: $workflow.nextflow.version + ${workflow.manifest.name}: ${getWorkflowVersion()} + Nextflow: ${workflow.nextflow.version} """.stripIndent().trim() } @@ -122,11 +98,7 @@ def workflowVersionToYAML() { // Get channel of software versions used in pipeline in YAML format // def softwareVersionsToYAML(ch_versions) { - return ch_versions - .unique() - .map { processVersionsFromYAML(it) } - .unique() - .mix(Channel.of(workflowVersionToYAML())) + return ch_versions.unique().map { version -> processVersionsFromYAML(version) }.unique().mix(Channel.of(workflowVersionToYAML())) } // @@ -134,61 +106,40 @@ def softwareVersionsToYAML(ch_versions) { // def paramsSummaryMultiqc(summary_params) { def summary_section = '' - for (group in summary_params.keySet()) { - def group_params = summary_params.get(group) // This gets the parameters of that particular group - if (group_params) { - summary_section += "

    $group

    \n" - summary_section += "
    \n" - for (param in group_params.keySet()) { - summary_section += "
    $param
    ${group_params.get(param) ?: 'N/A'}
    \n" + summary_params + .keySet() + .each { group -> + def group_params = summary_params.get(group) + // This gets the parameters of that particular group + if (group_params) { + summary_section += "

    ${group}

    \n" + summary_section += "
    \n" + group_params + .keySet() + .sort() + .each { param -> + summary_section += "
    ${param}
    ${group_params.get(param) ?: 'N/A'}
    \n" + } + summary_section += "
    \n" } - summary_section += "
    \n" } - } - String yaml_file_text = "id: '${workflow.manifest.name.replace('/','-')}-summary'\n" - yaml_file_text += "description: ' - this information is collected when the pipeline is started.'\n" - yaml_file_text += "section_name: '${workflow.manifest.name} Workflow Summary'\n" - yaml_file_text += "section_href: 'https://github.com/${workflow.manifest.name}'\n" - yaml_file_text += "plot_type: 'html'\n" - yaml_file_text += "data: |\n" - yaml_file_text += "${summary_section}" + def yaml_file_text = "id: '${workflow.manifest.name.replace('/', '-')}-summary'\n" as String + yaml_file_text += "description: ' - this information is collected when the pipeline is started.'\n" + yaml_file_text += "section_name: '${workflow.manifest.name} Workflow Summary'\n" + yaml_file_text += "section_href: 'https://github.com/${workflow.manifest.name}'\n" + yaml_file_text += "plot_type: 'html'\n" + yaml_file_text += "data: |\n" + yaml_file_text += "${summary_section}" return yaml_file_text } -// -// nf-core logo -// -def nfCoreLogo(monochrome_logs=true) { - Map colors = logColours(monochrome_logs) - String.format( - """\n - ${dashedLine(monochrome_logs)} - ${colors.green},--.${colors.black}/${colors.green},-.${colors.reset} - ${colors.blue} ___ __ __ __ ___ ${colors.green}/,-._.--~\'${colors.reset} - ${colors.blue} |\\ | |__ __ / ` / \\ |__) |__ ${colors.yellow}} {${colors.reset} - ${colors.blue} | \\| | \\__, \\__/ | \\ |___ ${colors.green}\\`-._,-`-,${colors.reset} - ${colors.green}`._,._,\'${colors.reset} - ${colors.purple} ${workflow.manifest.name} ${getWorkflowVersion()}${colors.reset} - ${dashedLine(monochrome_logs)} - """.stripIndent() - ) -} - -// -// Return dashed line -// -def dashedLine(monochrome_logs=true) { - Map colors = logColours(monochrome_logs) - return "-${colors.dim}----------------------------------------------------${colors.reset}-" -} - // // ANSII colours used for terminal logging // def logColours(monochrome_logs=true) { - Map colorcodes = [:] + def colorcodes = [:] as Map // Reset / Meta colorcodes['reset'] = monochrome_logs ? '' : "\033[0m" @@ -200,79 +151,76 @@ def logColours(monochrome_logs=true) { colorcodes['hidden'] = monochrome_logs ? '' : "\033[8m" // Regular Colors - colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" - colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m" - colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" - colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" - colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" - colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" - colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" - colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" + colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" + colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m" + colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" + colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" + colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" + colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" + colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" + colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" // Bold - colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m" - colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m" - colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m" - colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m" - colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m" - colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m" - colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m" - colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m" + colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m" + colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m" + colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m" + colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m" + colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m" + colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m" + colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m" + colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m" // Underline - colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m" - colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m" - colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m" - colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m" - colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m" - colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m" - colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m" - colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m" + colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m" + colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m" + colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m" + colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m" + colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m" + colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m" + colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m" + colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m" // High Intensity - colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m" - colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m" - colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m" - colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m" - colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m" - colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m" - colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m" - colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m" + colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m" + colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m" + colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m" + colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m" + colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m" + colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m" + colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m" + colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m" // Bold High Intensity - colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m" - colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m" - colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m" - colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m" - colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m" - colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m" - colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m" - colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m" + colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m" + colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m" + colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m" + colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m" + colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m" + colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m" + colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m" + colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m" return colorcodes } -// -// Attach the multiqc report to email -// -def attachMultiqcReport(multiqc_report) { - def mqc_report = null - try { - if (workflow.success) { - mqc_report = multiqc_report.getVal() - if (mqc_report.getClass() == ArrayList && mqc_report.size() >= 1) { - if (mqc_report.size() > 1) { - log.warn "[$workflow.manifest.name] Found multiple reports from process 'MULTIQC', will use only one" - } - mqc_report = mqc_report[0] - } - } - } catch (all) { - if (multiqc_report) { - log.warn "[$workflow.manifest.name] Could not attach MultiQC report to summary email" +// Return a single report from an object that may be a Path or List +// +def getSingleReport(multiqc_reports) { + if (multiqc_reports instanceof Path) { + return multiqc_reports + } else if (multiqc_reports instanceof List) { + if (multiqc_reports.size() == 0) { + log.warn("[${workflow.manifest.name}] No reports found from process 'MULTIQC'") + return null + } else if (multiqc_reports.size() == 1) { + return multiqc_reports.first() + } else { + log.warn("[${workflow.manifest.name}] Found multiple reports from process 'MULTIQC', will use only one") + return multiqc_reports.first() } + } else { + return null } - return mqc_report } // @@ -281,26 +229,35 @@ def attachMultiqcReport(multiqc_report) { def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdir, monochrome_logs=true, multiqc_report=null) { // Set up the e-mail variables - def subject = "[$workflow.manifest.name] Successful: $workflow.runName" + def subject = "[${workflow.manifest.name}] Successful: ${workflow.runName}" if (!workflow.success) { - subject = "[$workflow.manifest.name] FAILED: $workflow.runName" + subject = "[${workflow.manifest.name}] FAILED: ${workflow.runName}" } def summary = [:] - for (group in summary_params.keySet()) { - summary << summary_params[group] - } + summary_params + .keySet() + .sort() + .each { group -> + summary << summary_params[group] + } def misc_fields = [:] misc_fields['Date Started'] = workflow.start misc_fields['Date Completed'] = workflow.complete misc_fields['Pipeline script file path'] = workflow.scriptFile misc_fields['Pipeline script hash ID'] = workflow.scriptId - if (workflow.repository) misc_fields['Pipeline repository Git URL'] = workflow.repository - if (workflow.commitId) misc_fields['Pipeline repository Git Commit'] = workflow.commitId - if (workflow.revision) misc_fields['Pipeline Git branch/tag'] = workflow.revision - misc_fields['Nextflow Version'] = workflow.nextflow.version - misc_fields['Nextflow Build'] = workflow.nextflow.build + if (workflow.repository) { + misc_fields['Pipeline repository Git URL'] = workflow.repository + } + if (workflow.commitId) { + misc_fields['Pipeline repository Git Commit'] = workflow.commitId + } + if (workflow.revision) { + misc_fields['Pipeline Git branch/tag'] = workflow.revision + } + misc_fields['Nextflow Version'] = workflow.nextflow.version + misc_fields['Nextflow Build'] = workflow.nextflow.build misc_fields['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp def email_fields = [:] @@ -317,7 +274,7 @@ def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdi email_fields['summary'] = summary << misc_fields // On success try attach the multiqc report - def mqc_report = attachMultiqcReport(multiqc_report) + def mqc_report = getSingleReport(multiqc_report) // Check if we are only sending emails on failure def email_address = email @@ -337,40 +294,45 @@ def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdi def email_html = html_template.toString() // Render the sendmail template - def max_multiqc_email_size = (params.containsKey('max_multiqc_email_size') ? params.max_multiqc_email_size : 0) as nextflow.util.MemoryUnit - def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "${workflow.projectDir}", mqcFile: mqc_report, mqcMaxSize: max_multiqc_email_size.toBytes() ] + def max_multiqc_email_size = (params.containsKey('max_multiqc_email_size') ? params.max_multiqc_email_size : 0) as MemoryUnit + def smail_fields = [email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "${workflow.projectDir}", mqcFile: mqc_report, mqcMaxSize: max_multiqc_email_size.toBytes()] def sf = new File("${workflow.projectDir}/assets/sendmail_template.txt") def sendmail_template = engine.createTemplate(sf).make(smail_fields) def sendmail_html = sendmail_template.toString() // Send the HTML e-mail - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map if (email_address) { try { - if (plaintext_email) { throw GroovyException('Send plaintext e-mail, not HTML') } + if (plaintext_email) { + new org.codehaus.groovy.GroovyException('Send plaintext e-mail, not HTML') + } // Try to send HTML e-mail using sendmail def sendmail_tf = new File(workflow.launchDir.toString(), ".sendmail_tmp.html") sendmail_tf.withWriter { w -> w << sendmail_html } - [ 'sendmail', '-t' ].execute() << sendmail_html - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (sendmail)-" - } catch (all) { + ['sendmail', '-t'].execute() << sendmail_html + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Sent summary e-mail to ${email_address} (sendmail)-") + } + catch (Exception msg) { + log.debug(msg.toString()) + log.debug("Trying with mail instead of sendmail") // Catch failures and try with plaintext - def mail_cmd = [ 'mail', '-s', subject, '--content-type=text/html', email_address ] + def mail_cmd = ['mail', '-s', subject, '--content-type=text/html', email_address] mail_cmd.execute() << email_html - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (mail)-" + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Sent summary e-mail to ${email_address} (mail)-") } } // Write summary e-mail HTML to a file def output_hf = new File(workflow.launchDir.toString(), ".pipeline_report.html") output_hf.withWriter { w -> w << email_html } - FilesEx.copyTo(output_hf.toPath(), "${outdir}/pipeline_info/pipeline_report.html"); + nextflow.extension.FilesEx.copyTo(output_hf.toPath(), "${outdir}/pipeline_info/pipeline_report.html") output_hf.delete() // Write summary e-mail TXT to a file def output_tf = new File(workflow.launchDir.toString(), ".pipeline_report.txt") output_tf.withWriter { w -> w << email_txt } - FilesEx.copyTo(output_tf.toPath(), "${outdir}/pipeline_info/pipeline_report.txt"); + nextflow.extension.FilesEx.copyTo(output_tf.toPath(), "${outdir}/pipeline_info/pipeline_report.txt") output_tf.delete() } @@ -378,15 +340,17 @@ def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdi // Print pipeline summary on completion // def completionSummary(monochrome_logs=true) { - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map if (workflow.success) { if (workflow.stats.ignoredCount == 0) { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Pipeline completed successfully${colors.reset}-" - } else { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.yellow} Pipeline completed successfully, but with errored process(es) ${colors.reset}-" + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Pipeline completed successfully${colors.reset}-") } - } else { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed with errors${colors.reset}-" + else { + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.yellow} Pipeline completed successfully, but with errored process(es) ${colors.reset}-") + } + } + else { + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.red} Pipeline completed with errors${colors.reset}-") } } @@ -395,21 +359,30 @@ def completionSummary(monochrome_logs=true) { // def imNotification(summary_params, hook_url) { def summary = [:] - for (group in summary_params.keySet()) { - summary << summary_params[group] - } + summary_params + .keySet() + .sort() + .each { group -> + summary << summary_params[group] + } def misc_fields = [:] - misc_fields['start'] = workflow.start - misc_fields['complete'] = workflow.complete - misc_fields['scriptfile'] = workflow.scriptFile - misc_fields['scriptid'] = workflow.scriptId - if (workflow.repository) misc_fields['repository'] = workflow.repository - if (workflow.commitId) misc_fields['commitid'] = workflow.commitId - if (workflow.revision) misc_fields['revision'] = workflow.revision - misc_fields['nxf_version'] = workflow.nextflow.version - misc_fields['nxf_build'] = workflow.nextflow.build - misc_fields['nxf_timestamp'] = workflow.nextflow.timestamp + misc_fields['start'] = workflow.start + misc_fields['complete'] = workflow.complete + misc_fields['scriptfile'] = workflow.scriptFile + misc_fields['scriptid'] = workflow.scriptId + if (workflow.repository) { + misc_fields['repository'] = workflow.repository + } + if (workflow.commitId) { + misc_fields['commitid'] = workflow.commitId + } + if (workflow.revision) { + misc_fields['revision'] = workflow.revision + } + misc_fields['nxf_version'] = workflow.nextflow.version + misc_fields['nxf_build'] = workflow.nextflow.build + misc_fields['nxf_timestamp'] = workflow.nextflow.timestamp def msg_fields = [:] msg_fields['version'] = getWorkflowVersion() @@ -434,13 +407,13 @@ def imNotification(summary_params, hook_url) { def json_message = json_template.toString() // POST - def post = new URL(hook_url).openConnection(); + def post = new URL(hook_url).openConnection() post.setRequestMethod("POST") post.setDoOutput(true) post.setRequestProperty("Content-Type", "application/json") - post.getOutputStream().write(json_message.getBytes("UTF-8")); - def postRC = post.getResponseCode(); - if (! postRC.equals(200)) { - log.warn(post.getErrorStream().getText()); + post.getOutputStream().write(json_message.getBytes("UTF-8")) + def postRC = post.getResponseCode() + if (!postRC.equals(200)) { + log.warn(post.getErrorStream().getText()) } } diff --git a/subworkflows/nf-core/utils_nfcore_pipeline/tests/main.function.nf.test b/subworkflows/nf-core/utils_nfcore_pipeline/tests/main.function.nf.test index 1dc317f..f117040 100644 --- a/subworkflows/nf-core/utils_nfcore_pipeline/tests/main.function.nf.test +++ b/subworkflows/nf-core/utils_nfcore_pipeline/tests/main.function.nf.test @@ -41,26 +41,14 @@ nextflow_function { } } - test("Test Function workflowCitation") { - - function "workflowCitation" - - then { - assertAll( - { assert function.success }, - { assert snapshot(function.result).match() } - ) - } - } - - test("Test Function nfCoreLogo") { + test("Test Function without logColours") { - function "nfCoreLogo" + function "logColours" when { function { """ - input[0] = false + input[0] = true """ } } @@ -73,9 +61,8 @@ nextflow_function { } } - test("Test Function dashedLine") { - - function "dashedLine" + test("Test Function with logColours") { + function "logColours" when { function { @@ -93,14 +80,13 @@ nextflow_function { } } - test("Test Function without logColours") { - - function "logColours" + test("Test Function getSingleReport with a single file") { + function "getSingleReport" when { function { """ - input[0] = true + input[0] = file(params.modules_testdata_base_path + '/generic/tsv/test.tsv', checkIfExists: true) """ } } @@ -108,18 +94,22 @@ nextflow_function { then { assertAll( { assert function.success }, - { assert snapshot(function.result).match() } + { assert function.result.contains("test.tsv") } ) } } - test("Test Function with logColours") { - function "logColours" + test("Test Function getSingleReport with multiple files") { + function "getSingleReport" when { function { """ - input[0] = false + input[0] = [ + file(params.modules_testdata_base_path + '/generic/tsv/test.tsv', checkIfExists: true), + file(params.modules_testdata_base_path + '/generic/tsv/network.tsv', checkIfExists: true), + file(params.modules_testdata_base_path + '/generic/tsv/expression.tsv', checkIfExists: true) + ] """ } } @@ -127,7 +117,9 @@ nextflow_function { then { assertAll( { assert function.success }, - { assert snapshot(function.result).match() } + { assert function.result.contains("test.tsv") }, + { assert !function.result.contains("network.tsv") }, + { assert !function.result.contains("expression.tsv") } ) } } diff --git a/subworkflows/nf-core/utils_nfcore_pipeline/tests/main.function.nf.test.snap b/subworkflows/nf-core/utils_nfcore_pipeline/tests/main.function.nf.test.snap index 1037232..02c6701 100644 --- a/subworkflows/nf-core/utils_nfcore_pipeline/tests/main.function.nf.test.snap +++ b/subworkflows/nf-core/utils_nfcore_pipeline/tests/main.function.nf.test.snap @@ -17,26 +17,6 @@ }, "timestamp": "2024-02-28T12:02:59.729647" }, - "Test Function nfCoreLogo": { - "content": [ - "\n\n-\u001b[2m----------------------------------------------------\u001b[0m-\n \u001b[0;32m,--.\u001b[0;30m/\u001b[0;32m,-.\u001b[0m\n\u001b[0;34m ___ __ __ __ ___ \u001b[0;32m/,-._.--~'\u001b[0m\n\u001b[0;34m |\\ | |__ __ / ` / \\ |__) |__ \u001b[0;33m} {\u001b[0m\n\u001b[0;34m | \\| | \\__, \\__/ | \\ |___ \u001b[0;32m\\`-._,-`-,\u001b[0m\n \u001b[0;32m`._,._,'\u001b[0m\n\u001b[0;35m nextflow_workflow v9.9.9\u001b[0m\n-\u001b[2m----------------------------------------------------\u001b[0m-\n" - ], - "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" - }, - "timestamp": "2024-02-28T12:03:10.562934" - }, - "Test Function workflowCitation": { - "content": [ - "If you use nextflow_workflow for your analysis please cite:\n\n* The pipeline\n https://doi.org/10.5281/zenodo.5070524\n\n* The nf-core framework\n https://doi.org/10.1038/s41587-020-0439-x\n\n* Software dependencies\n https://github.com/nextflow_workflow/blob/master/CITATIONS.md" - ], - "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" - }, - "timestamp": "2024-02-28T12:03:07.019761" - }, "Test Function without logColours": { "content": [ { @@ -95,16 +75,6 @@ }, "timestamp": "2024-02-28T12:03:17.969323" }, - "Test Function dashedLine": { - "content": [ - "-\u001b[2m----------------------------------------------------\u001b[0m-" - ], - "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" - }, - "timestamp": "2024-02-28T12:03:14.366181" - }, "Test Function with logColours": { "content": [ { diff --git a/subworkflows/nf-core/utils_nfschema_plugin/main.nf b/subworkflows/nf-core/utils_nfschema_plugin/main.nf new file mode 100644 index 0000000..4994303 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/main.nf @@ -0,0 +1,46 @@ +// +// Subworkflow that uses the nf-schema plugin to validate parameters and render the parameter summary +// + +include { paramsSummaryLog } from 'plugin/nf-schema' +include { validateParameters } from 'plugin/nf-schema' + +workflow UTILS_NFSCHEMA_PLUGIN { + + take: + input_workflow // workflow: the workflow object used by nf-schema to get metadata from the workflow + validate_params // boolean: validate the parameters + parameters_schema // string: path to the parameters JSON schema. + // this has to be the same as the schema given to `validation.parametersSchema` + // when this input is empty it will automatically use the configured schema or + // "${projectDir}/nextflow_schema.json" as default. This input should not be empty + // for meta pipelines + + main: + + // + // Print parameter summary to stdout. This will display the parameters + // that differ from the default given in the JSON schema + // + if(parameters_schema) { + log.info paramsSummaryLog(input_workflow, parameters_schema:parameters_schema) + } else { + log.info paramsSummaryLog(input_workflow) + } + + // + // Validate the parameters using nextflow_schema.json or the schema + // given via the validation.parametersSchema configuration option + // + if(validate_params) { + if(parameters_schema) { + validateParameters(parameters_schema:parameters_schema) + } else { + validateParameters() + } + } + + emit: + dummy_emit = true +} + diff --git a/subworkflows/nf-core/utils_nfschema_plugin/meta.yml b/subworkflows/nf-core/utils_nfschema_plugin/meta.yml new file mode 100644 index 0000000..f7d9f02 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/meta.yml @@ -0,0 +1,35 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json +name: "utils_nfschema_plugin" +description: Run nf-schema to validate parameters and create a summary of changed parameters +keywords: + - validation + - JSON schema + - plugin + - parameters + - summary +components: [] +input: + - input_workflow: + type: object + description: | + The workflow object of the used pipeline. + This object contains meta data used to create the params summary log + - validate_params: + type: boolean + description: Validate the parameters and error if invalid. + - parameters_schema: + type: string + description: | + Path to the parameters JSON schema. + This has to be the same as the schema given to the `validation.parametersSchema` config + option. When this input is empty it will automatically use the configured schema or + "${projectDir}/nextflow_schema.json" as default. The schema should not be given in this way + for meta pipelines. +output: + - dummy_emit: + type: boolean + description: Dummy emit to make nf-core subworkflows lint happy +authors: + - "@nvnieuwk" +maintainers: + - "@nvnieuwk" diff --git a/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test b/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test new file mode 100644 index 0000000..8fb3016 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test @@ -0,0 +1,117 @@ +nextflow_workflow { + + name "Test Subworkflow UTILS_NFSCHEMA_PLUGIN" + script "../main.nf" + workflow "UTILS_NFSCHEMA_PLUGIN" + + tag "subworkflows" + tag "subworkflows_nfcore" + tag "subworkflows/utils_nfschema_plugin" + tag "plugin/nf-schema" + + config "./nextflow.config" + + test("Should run nothing") { + + when { + + params { + test_data = '' + } + + workflow { + """ + validate_params = false + input[0] = workflow + input[1] = validate_params + input[2] = "" + """ + } + } + + then { + assertAll( + { assert workflow.success } + ) + } + } + + test("Should validate params") { + + when { + + params { + test_data = '' + outdir = null + } + + workflow { + """ + validate_params = true + input[0] = workflow + input[1] = validate_params + input[2] = "" + """ + } + } + + then { + assertAll( + { assert workflow.failed }, + { assert workflow.stdout.any { it.contains('ERROR ~ Validation of pipeline parameters failed!') } } + ) + } + } + + test("Should run nothing - custom schema") { + + when { + + params { + test_data = '' + } + + workflow { + """ + validate_params = false + input[0] = workflow + input[1] = validate_params + input[2] = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + """ + } + } + + then { + assertAll( + { assert workflow.success } + ) + } + } + + test("Should validate params - custom schema") { + + when { + + params { + test_data = '' + outdir = null + } + + workflow { + """ + validate_params = true + input[0] = workflow + input[1] = validate_params + input[2] = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + """ + } + } + + then { + assertAll( + { assert workflow.failed }, + { assert workflow.stdout.any { it.contains('ERROR ~ Validation of pipeline parameters failed!') } } + ) + } + } +} diff --git a/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config new file mode 100644 index 0000000..0907ac5 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config @@ -0,0 +1,8 @@ +plugins { + id "nf-schema@2.1.0" +} + +validation { + parametersSchema = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + monochromeLogs = true +} \ No newline at end of file diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json similarity index 95% rename from subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json rename to subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json index 7626c1c..331e0d2 100644 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json @@ -1,10 +1,10 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/./master/nextflow_schema.json", "title": ". pipeline parameters", "description": "", "type": "object", - "definitions": { + "$defs": { "input_output_options": { "title": "Input/output options", "type": "object", @@ -87,10 +87,10 @@ }, "allOf": [ { - "$ref": "#/definitions/input_output_options" + "$ref": "#/$defs/input_output_options" }, { - "$ref": "#/definitions/generic_options" + "$ref": "#/$defs/generic_options" } ] } diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf b/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf deleted file mode 100644 index 2585b65..0000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf +++ /dev/null @@ -1,62 +0,0 @@ -// -// Subworkflow that uses the nf-validation plugin to render help text and parameter summary -// - -/* -======================================================================================== - IMPORT NF-VALIDATION PLUGIN -======================================================================================== -*/ - -include { paramsHelp } from 'plugin/nf-validation' -include { paramsSummaryLog } from 'plugin/nf-validation' -include { validateParameters } from 'plugin/nf-validation' - -/* -======================================================================================== - SUBWORKFLOW DEFINITION -======================================================================================== -*/ - -workflow UTILS_NFVALIDATION_PLUGIN { - - take: - print_help // boolean: print help - workflow_command // string: default commmand used to run pipeline - pre_help_text // string: string to be printed before help text and summary log - post_help_text // string: string to be printed after help text and summary log - validate_params // boolean: validate parameters - schema_filename // path: JSON schema file, null to use default value - - main: - - log.debug "Using schema file: ${schema_filename}" - - // Default values for strings - pre_help_text = pre_help_text ?: '' - post_help_text = post_help_text ?: '' - workflow_command = workflow_command ?: '' - - // - // Print help message if needed - // - if (print_help) { - log.info pre_help_text + paramsHelp(workflow_command, parameters_schema: schema_filename) + post_help_text - System.exit(0) - } - - // - // Print parameter summary to stdout - // - log.info pre_help_text + paramsSummaryLog(workflow, parameters_schema: schema_filename) + post_help_text - - // - // Validate parameters relative to the parameter JSON schema - // - if (validate_params){ - validateParameters(parameters_schema: schema_filename) - } - - emit: - dummy_emit = true -} diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml b/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml deleted file mode 100644 index 3d4a6b0..0000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml +++ /dev/null @@ -1,44 +0,0 @@ -# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json -name: "UTILS_NFVALIDATION_PLUGIN" -description: Use nf-validation to initiate and validate a pipeline -keywords: - - utility - - pipeline - - initialise - - validation -components: [] -input: - - print_help: - type: boolean - description: | - Print help message and exit - - workflow_command: - type: string - description: | - The command to run the workflow e.g. "nextflow run main.nf" - - pre_help_text: - type: string - description: | - Text to print before the help message - - post_help_text: - type: string - description: | - Text to print after the help message - - validate_params: - type: boolean - description: | - Validate the parameters and error if invalid. - - schema_filename: - type: string - description: | - The filename of the schema to validate against. -output: - - dummy_emit: - type: boolean - description: | - Dummy emit to make nf-core subworkflows lint happy -authors: - - "@adamrtalbot" -maintainers: - - "@adamrtalbot" - - "@maxulysse" diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test b/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test deleted file mode 100644 index 5784a33..0000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test +++ /dev/null @@ -1,200 +0,0 @@ -nextflow_workflow { - - name "Test Workflow UTILS_NFVALIDATION_PLUGIN" - script "../main.nf" - workflow "UTILS_NFVALIDATION_PLUGIN" - tag "subworkflows" - tag "subworkflows_nfcore" - tag "plugin/nf-validation" - tag "'plugin/nf-validation'" - tag "utils_nfvalidation_plugin" - tag "subworkflows/utils_nfvalidation_plugin" - - test("Should run nothing") { - - when { - - params { - monochrome_logs = true - test_data = '' - } - - workflow { - """ - help = false - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success } - ) - } - } - - test("Should run help") { - - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } } - ) - } - } - - test("Should run help with command") { - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = "nextflow run noorg/doesntexist" - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('nextflow run noorg/doesntexist') } }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } } - ) - } - } - - test("Should run help with extra text") { - - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = "nextflow run noorg/doesntexist" - pre_help_text = "pre-help-text" - post_help_text = "post-help-text" - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('pre-help-text') } }, - { assert workflow.stdout.any { it.contains('nextflow run noorg/doesntexist') } }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } }, - { assert workflow.stdout.any { it.contains('post-help-text') } } - ) - } - } - - test("Should validate params") { - - when { - - params { - monochrome_logs = true - test_data = '' - outdir = 1 - } - workflow { - """ - help = false - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = true - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.failed }, - { assert workflow.stdout.any { it.contains('ERROR ~ ERROR: Validation of pipeline parameters failed!') } } - ) - } - } -} diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml b/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml deleted file mode 100644 index 60b1cff..0000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml +++ /dev/null @@ -1,2 +0,0 @@ -subworkflows/utils_nfvalidation_plugin: - - subworkflows/nf-core/utils_nfvalidation_plugin/** diff --git a/tests/main.nf.test b/tests/main.nf.test new file mode 100755 index 0000000..2417f21 --- /dev/null +++ b/tests/main.nf.test @@ -0,0 +1,41 @@ +nextflow_pipeline { + + name "Test fastqrepair pipeline" + script "../main.nf" + + tag "pipeline" + tag "fastqrepair" + + test("30 paired-end reads, two FASTQ are empty") { + + when { + params { + input = 'https://raw.githubusercontent.com/nf-core/test-datasets/fastqrepair/testdata/samplesheet_30reads.csv' + outdir = "$outputDir/30reads" + + publish_all_tools = true + num_splits = 2 + } + } + + then { + assertAll( + { assert workflow.success }, + { assert snapshot( + path("$outputDir/30reads/repaired").list(), + path("$outputDir/30reads/repaired/test_30reads_2.report"), + path("$outputDir/30reads/pipeline_info/nf-core_fastqrepair_versions.yml"), + file("$outputDir/30reads/QC/multiqc/multiqc_report.html").exists(), + file("$outputDir/30reads/QC/fastqc/test_30reads_2.fastqc.html").exists(), + ).match("30reads_final") + }, + { assert path("$outputDir/30reads/repaired").list().size() == 2 }, + { assert new File("$outputDir/30reads/pipeline_info/nf-core_fastqrepair_versions.yml").exists() }, + { assert new File("$outputDir/30reads/repaired/test_30reads_2.fastq.gz").exists() }, + { assert new File("$outputDir/30reads/repaired/test_30reads_2.report").exists() }, + { assert new File("$outputDir/30reads/repaired/test_30reads_2.report").readLines()[0].contains("FASTQWIPER REPORT:") }, + { assert new File("$outputDir/30reads/repaired/test_30reads_2.report").readLines()[3].contains("96 (92.31%)") } + ) + } + } +} diff --git a/tests/main.nf.test.snap b/tests/main.nf.test.snap new file mode 100644 index 0000000..83d4f04 --- /dev/null +++ b/tests/main.nf.test.snap @@ -0,0 +1,19 @@ +{ + "30reads_final": { + "content": [ + [ + "test_30reads_2.fastq.gz:md5,688d1247ff20870b1f30cd39976007ba", + "test_30reads_2.report:md5,66c4833c1d7d2a94a34a786873869c7a" + ], + "test_30reads_2.report:md5,66c4833c1d7d2a94a34a786873869c7a", + "nf-core_fastqrepair_versions.yml:md5,bb1fead1cbd81fee8ab0b14ad725161f", + true, + false + ], + "meta": { + "nf-test": "0.9.2", + "nextflow": "24.10.4" + }, + "timestamp": "2025-02-01T18:06:11.347721" + } +} \ No newline at end of file diff --git a/tests/nextflow.config b/tests/nextflow.config new file mode 100755 index 0000000..276b7a3 --- /dev/null +++ b/tests/nextflow.config @@ -0,0 +1,30 @@ +/* +======================================================================================== + Nextflow config file for running tests +======================================================================================== +*/ + +params { + qin = 33 + alphabet = "ACGTN" +} + +// Impose sensible resource limits for testing +process { + resourceLimits = [ + cpus: 2, + memory: '3.GB', + time: '1.h' + ] +} + +// Impose same minimum Nextflow version as the pipeline for testing +manifest { + nextflowVersion = '!>=23.04.0' +} + +// Disable all Nextflow reporting options +timeline { enabled = false } +report { enabled = false } +trace { enabled = false } +dag { enabled = false } diff --git a/workflows/fastqrepair.nf b/workflows/fastqrepair.nf index 1721afe..ba6b9a8 100644 --- a/workflows/fastqrepair.nf +++ b/workflows/fastqrepair.nf @@ -3,13 +3,17 @@ IMPORT MODULES / SUBWORKFLOWS / FUNCTIONS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ +include { FASTQC } from '../modules/nf-core/fastqc/main' +include { MULTIQC } from '../modules/nf-core/multiqc/main' +include { GZRT } from '../modules/nf-core/gzrt/main' +include { BBMAP_REPAIR } from '../modules/nf-core/bbmap/repair/main' +include { FASTQ_REPAIR_WIPERTOOLS } from '../subworkflows/local/fastq_repair_wipertools/main' +include { paramsSummaryMap } from 'plugin/nf-schema' +include { paramsSummaryMultiqc } from '../subworkflows/nf-core/utils_nfcore_pipeline' +include { softwareVersionsToYAML } from '../subworkflows/nf-core/utils_nfcore_pipeline' +include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_fastqrepair_pipeline' +include { isFastqFileEmpty } from '../subworkflows/local/utils_nfcore_fastqrepair_pipeline' -include { FASTQC } from '../modules/nf-core/fastqc/main' -include { MULTIQC } from '../modules/nf-core/multiqc/main' -include { paramsSummaryMap } from 'plugin/nf-validation' -include { paramsSummaryMultiqc } from '../subworkflows/nf-core/utils_nfcore_pipeline' -include { softwareVersionsToYAML } from '../subworkflows/nf-core/utils_nfcore_pipeline' -include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_fastqrepair_pipeline' /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -18,31 +22,105 @@ include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_fast */ workflow FASTQREPAIR { - take: ch_samplesheet // channel: samplesheet read in from --input main: + ch_final = Channel.empty() // channel: repaired fastq files + ch_versions = Channel.empty() // channel: versions of the software used in the pipeline - ch_versions = Channel.empty() - ch_multiqc_files = Channel.empty() + // branch .gz and non gz files + ch_fastq_ext = Channel.empty() + ch_samplesheet + | branch { _map, fq -> + gz_files: fq.first().getExtension() == 'gz' + non_gz_files: true } + | set { ch_fastq_ext } // - // MODULE: Run FastQC + // Recover corrupted gz files // - FASTQC ( - ch_samplesheet - ) - ch_multiqc_files = ch_multiqc_files.mix(FASTQC.out.zip.collect{it[1]}) - ch_versions = ch_versions.mix(FASTQC.out.versions.first()) + GZRT (ch_fastq_ext.gz_files) + ch_versions = ch_versions.mix(GZRT.out.versions.first()) + + // Join recovered gz files with non-gz files and filter empty files out + ch_tobewiped_fastq = Channel.empty() + GZRT.out.recovered + | concat ( ch_fastq_ext.non_gz_files ) + | filter { meta, fileList -> meta.single_end + ? fileList instanceof List ? !isFastqFileEmpty(fileList[0]) : !isFastqFileEmpty(fileList) + : !isFastqFileEmpty(fileList[0]) && !isFastqFileEmpty(fileList[1])} + | set { ch_tobewiped_fastq } + + // If all input files are empty, then skip the rest of the pipeline + ch_tobewiped_fastq + | ifEmpty { + log.warn "No non-empty FASTQ files found after GZRT. Skipping the rest of the pipeline for ${meta.id}" + } + + // If ch_tobewiped_fastq has a size < ch_samplesheet but > zero, then some files were empty and we need to log.warn them + ch_samplesheet.map { meta, _ -> meta.id } + .collect() + .set { all_ids } + + // Extract meta.id from ch_subset and collect into a list + ch_tobewiped_fastq.map { meta, _ -> meta.id } + .collect() + .set { subset_ids } + + all_ids.minus(subset_ids).subscribe{ id_list -> if (id_list.size > 0) log.warn "FASTQ files with the following meta.ids are empty: ${id_list.join(', ')}" } // - // Collate and save software versions + // Make fastq compliant and wipe bad characters + // + ch_repaired_fastq = Channel.empty() + FASTQ_REPAIR_WIPERTOOLS (ch_tobewiped_fastq) + ch_versions = ch_versions.mix(FASTQ_REPAIR_WIPERTOOLS.out.versions.first()) + + FASTQ_REPAIR_WIPERTOOLS.out.wiped_fastq + | map { meta, fq -> [meta.subMap('sample_id', 'single_end'), fq]} + | map { meta, fq -> [['id':meta.sample_id, 'single_end':meta.single_end], fq]} + | branch { + single_end: it[0].single_end == true + paired_end: it[0].single_end == false } + | set { ch_repaired_fastq } + + // Group paired-reads by 'sample_id' and rename keys + ch_repaired_fastq_paired_end = Channel.empty() + ch_repaired_fastq.paired_end + | groupTuple + | set { ch_repaired_fastq_paired_end } + + // Settle reads pairing (re-pair, optional) // + if (!params.skip_bbmap_repair) { + // Re-pair reads + BBMAP_REPAIR (ch_repaired_fastq_paired_end, false) + ch_versions = ch_versions.mix(BBMAP_REPAIR.out.versions.first()) + + ch_repaired_fastq_paired_end_singleton = Channel.empty() + BBMAP_REPAIR.out.repaired + | concat ( BBMAP_REPAIR.out.singleton ) + | groupTuple + | map { meta, fq -> [meta, fq.flatten()] } + | set { ch_repaired_fastq_paired_end_singleton } + + ch_final = ch_repaired_fastq_paired_end_singleton.concat(ch_repaired_fastq.single_end) + } else { + ch_final = ch_repaired_fastq_paired_end.concat(ch_repaired_fastq.single_end) + } + + // + // Assess QC of all fastq files (both single and paired end) + // + FASTQC ( ch_final ) + ch_versions = ch_versions.mix(FASTQC.out.versions.first()) + + softwareVersionsToYAML(ch_versions) .collectFile( storeDir: "${params.outdir}/pipeline_info", - name: 'nf_core_pipeline_software_mqc_versions.yml', + name: 'nf-core_fastqrepair_versions.yml', sort: true, newLine: true ).set { ch_collated_versions } @@ -50,11 +128,11 @@ workflow FASTQREPAIR { // // MODULE: MultiQC // - ch_multiqc_config = Channel.fromPath( - "$projectDir/assets/multiqc_config.yml", checkIfExists: true) - ch_multiqc_custom_config = params.multiqc_config ? - Channel.fromPath(params.multiqc_config, checkIfExists: true) : - Channel.empty() + ch_multiqc_files = Channel.empty() + ch_multiqc_files = ch_multiqc_files.mix(FASTQC.out.zip.collect{it[1]}) + + ch_multiqc_config = Channel.fromPath("$projectDir/assets/multiqc_config.yml", checkIfExists: true) + ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config, checkIfExists: true) : Channel.empty() ch_multiqc_logo = params.multiqc_logo ? Channel.fromPath(params.multiqc_logo, checkIfExists: true) : Channel.empty() @@ -62,15 +140,14 @@ workflow FASTQREPAIR { summary_params = paramsSummaryMap( workflow, parameters_schema: "nextflow_schema.json") ch_workflow_summary = Channel.value(paramsSummaryMultiqc(summary_params)) - + ch_multiqc_files = ch_multiqc_files.mix( + ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')) ch_multiqc_custom_methods_description = params.multiqc_methods_description ? file(params.multiqc_methods_description, checkIfExists: true) : file("$projectDir/assets/methods_description_template.yml", checkIfExists: true) ch_methods_description = Channel.value( methodsDescriptionText(ch_multiqc_custom_methods_description)) - ch_multiqc_files = ch_multiqc_files.mix( - ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')) ch_multiqc_files = ch_multiqc_files.mix(ch_collated_versions) ch_multiqc_files = ch_multiqc_files.mix( ch_methods_description.collectFile( @@ -83,12 +160,13 @@ workflow FASTQREPAIR { ch_multiqc_files.collect(), ch_multiqc_config.toList(), ch_multiqc_custom_config.toList(), - ch_multiqc_logo.toList() + ch_multiqc_logo.toList(), + [], + [] ) - emit: - multiqc_report = MULTIQC.out.report.toList() // channel: /path/to/multiqc_report.html - versions = ch_versions // channel: [ path(versions.yml) ] + emit:multiqc_report = MULTIQC.out.report.toList() // channel: /path/to/multiqc_report.html + versions = ch_versions // channel: [ path(versions.yml) ] } /*