Skip to content

Commit

Permalink
Merge pull request #7 from NBISweden/0.1.0
Browse files Browse the repository at this point in the history
Update tests and documentation for 0.1.0 release
  • Loading branch information
percyfal authored Dec 19, 2023
2 parents 7c500b7 + 3410996 commit ebdf2de
Show file tree
Hide file tree
Showing 14 changed files with 389 additions and 118 deletions.
4 changes: 4 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,7 @@
^scripts$
^doc$
^Meta$
^_pkgdown\.yml$
^docs$
^pkgdown$
^tests/testthat.R$
29 changes: 14 additions & 15 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,18 @@ name: R-CMD-check

jobs:
R-CMD-check:
runs-on: macOS-latest
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: r-lib/actions/setup-r@master
- uses: r-lib/actions/setup-pandoc@master
- name: Install dependencies
run: |
install.packages(c("remotes", "rcmdcheck", "knitr"))
deps <- remotes::dev_package_deps(dependencies = TRUE)
install.packages(deps$package[!is.na(deps$available)])
if (!requireNamespace("BiocManager", quietly = TRUE)) {install.packages("BiocManager")}
BiocManager::install(deps$package[is.na(deps$available)])
shell: Rscript {0}
- name: Check
run: rcmdcheck::rcmdcheck(args = "--no-manual", error_on = "error")
shell: Rscript {0}
- uses: actions/checkout@v3
- uses: r-lib/actions/setup-r@v2
- uses: r-lib/actions/setup-pandoc@v2
with:
pandoc-version: '2.17.1'
- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::rcmdcheck devtools
needs: check
- uses: r-lib/actions/check-r-package@v2
with:
args: 'c("--no-manual")'
error-on: '"error"'
48 changes: 48 additions & 0 deletions .github/workflows/pkgdown.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
branches: [main, master]
pull_request:
branches: [main, master]
release:
types: [published]
workflow_dispatch:

name: pkgdown

jobs:
pkgdown:
runs-on: ubuntu-latest
# Only restrict concurrency for non-PR jobs
concurrency:
group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }}
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
permissions:
contents: write
steps:
- uses: actions/checkout@v3

- uses: r-lib/actions/setup-pandoc@v2

- uses: r-lib/actions/setup-r@v2
with:
use-public-rspm: true

- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::pkgdown, local::.
needs: website

- name: Build site
run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE)
shell: Rscript {0}

- name: Deploy to GitHub pages 🚀
if: github.event_name != 'pull_request'
uses: JamesIves/[email protected]
with:
clean: false
branch: gh-pages
folder: docs
2 changes: 2 additions & 0 deletions .lintr
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
linters: linters_with_defaults(
indentation_linter(hanging_indent_style="tidy"))
52 changes: 52 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: check-merge-conflict
- id: debug-statements
- id: mixed-line-ending
- id: detect-private-key
- id: check-case-conflict
- id: check-yaml
- id: trailing-whitespace
- repo: https://github.com/DavidAnson/markdownlint-cli2
rev: v0.11.0
hooks:
- id: markdownlint-cli2
files: \.(md|qmd)$
types: [file]
exclude: LICENSE.md
- id: markdownlint-cli2-fix
files: \.(md|qmd)$
types: [file]
exclude: LICENSE.md
- repo: https://github.com/lorenzwalthert/precommit
rev: v0.3.2.9027
hooks:
- id: style-files
name: style-files
description: style files with {styler}
entry: Rscript inst/hooks/exported/style-files.R
language: r
files: '(\.[rR]profile|\.[rR]|\.[rR]md|\.[rR]nw|\.[qQ]md)$'
exclude: |
(?x)^(
renv/activate\.R|
)$
minimum_pre_commit_version: "2.13.0"
- id: parsable-R
name: parsable-R
description: check if a .R file is parsable
entry: Rscript inst/hooks/exported/parsable-R.R
language: r
files: '\.[rR](md)?$'
minimum_pre_commit_version: "2.13.0"
- id: lintr
args: [--warn_only]
name: lintr
description: check if a `.R` file is lint free (using {lintr})
entry: Rscript inst/hooks/exported/lintr.R
language: r
files: '(\.[rR]profile|\.R|\.Rmd|\.Rnw|\.r|\.rmd|\.rnw)$'
exclude: 'renv/activate\.R'
minimum_pre_commit_version: "2.13.0"
9 changes: 4 additions & 5 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: genecovr
Title: Gene body coverage analysis to evaluate genome assemblies
Version: 0.0.0.9013
Authors@R:
Version: 0.1.0
Authors@R:
person(given = "Per",
family = "Unneberg",
role = c("aut", "cre"),
Expand All @@ -14,9 +14,7 @@ License: GPL-3
Encoding: UTF-8
LazyData: true
Imports:
BiocGenerics,
BiocParallel,
Biostrings,
GenomeInfoDb,
GenomicRanges (>= 1.32.0),
IRanges,
Expand All @@ -36,4 +34,5 @@ Suggests:
VignetteBuilder:
knitr
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.2
RoxygenNote: 7.2.3
URL: https://nbisweden.github.io/genecovr/
80 changes: 74 additions & 6 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,21 +19,30 @@ knitr::opts_chunk$set(
[![R build status](https://github.com/NBISweden/genecovr/workflows/R-CMD-check/badge.svg)](https://github.com/NBISweden/genecovr/actions)
<!-- badges: end -->

Perform gene body coverage analyses in R to evaluate genome assembly
quality.
`genecovr` is an `R` package that provides plotting functions that
summarize gene transcript to genome alignments. The main purpose is to
assess the effect of polishing and scaffolding operations has on the
quality of a genome assembly. The gene transcript set is a large
sequence set consisting of assembled transcripts from RNA-seq data
generated in relation to a genome assembly project. Therefore,
`genecovr` serves as a complement to software such as
[BUSCO](https://busco.ezlab.org/), which evaluates genome assembly
quality using a smaller set of well-defined single-copy orthologs.

## Installation

You can install the released version of genecovr from [NBIS
GitHub](https://github.com/nbis) with:

``` r
# If necessary, uncomment to install devtools
# install.packages("devtools")
devtools::install_github("NBISweden/genecovr")
```

## Usage

## Quick usage
### genecovr script quick start

There is a helper script for generating basic plots located in
PACKAGE_DIR/bin/genecovr. Create a data input csv-delimited file with
Expand All @@ -52,8 +61,67 @@ script to generate plots:
PACKAGE_DIR/bin/genecovr indata.csv
```

## Vignette
#### Example

Alternatively, import the library as usual in an R script and use the
package functions. See the vignette for a minimum working example.
There are example files located in PACKAGE_DIR/inst/extdata consisting
of two psl alignment files containing gmap alignments and fasta
indices for the transcript sequences and two for different assembly
versions:

- nonpolished.fai - fasta index for raw assembly
- polished.fai - fasta index for polished assembly
- transcripts.fai - fasta index for transcript sequences
- transcripts2nonpolished.psl - gmap alignments, transcripts to raw assembly
- transcripts2polished.psl - gmap alignments, transcripts to polished
assembly

Using these files and the labels `non` and `pol` for the different
assemblies, a `genecovr` input file (called e.g., `assemblies.csv`)
would look as follows:

```
nonpol,transcripts2nonpolished.psl,nonpolished.fai,transcripts.fai
pol,transcripts2polished.psl,polished.fai,transcripts.fai
```

and the command to run would be:

```
genecovr assemblies.csv
```

#### genecovr options

To list genecovr script options, type 'genecovr -h`:

```
usage: genecovr [-h] [-v] [-p number]
[-d OUTPUT_DIRECTORY] [--height HEIGHT]
[--width WIDTH]
csvfile
positional arguments:
csvfile csv-delimited file with columns
1. data label
2. mapping file (supported formats: psl)
3. assembly file (fasta or fasta index)
4. transcript file (fasta or fasta index)
optional arguments:
-h, --help show this help message and exit
-v, --verbose print extra output
-p number, --cpus number
number of cpus [default 1]
-d OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
output directory
--height HEIGHT figure height in inches [default 6.0]
--width WIDTH figure width in inches [default 6.0]
```



### R package vignette

Alternatively, import the library in an R script and use the package
functions. See [Get started](articles/genecovr.html) or run
`vignette("genecovr")` for a minimum working example.
9 changes: 9 additions & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
url: https://nbisweden.github.io/genecovr/
template:
bootstrap: 5

reference:
- title: genecovr
- contents:
- matches(".*")

19 changes: 13 additions & 6 deletions inst/bin/genecovr
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@ apl <- AlignmentPairsList(
seqinfo.query=transcripts.sinfo[[x]])
}, BPPARAM=bpparam)
)

names(apl) <- names(psl.fn)

##------------------------------
Expand Down Expand Up @@ -183,8 +184,9 @@ save_plot(p, outfile)


## FIXME: number of levels should be parametrized via option
suppressPackageStartupMessages(library(dplyr))
outfile <- file.path(outdir, "qnuminsert")
x <- insertionSummary(apl, reduce=FALSE)
x <- dplyr::tibble(insertionSummary(apl, reduce=FALSE))
p <- ggplot(x, aes(id)) +
geom_bar(aes(fill=cuts)) +
scale_fill_viridis_d(name="qNumInsert", begin=1, end=0)
Expand All @@ -199,9 +201,8 @@ message("saving ", outfile)
write.csv(x, file=gzfile(outfile), row.names=FALSE)

## Also make plot based on gbc
suppressPackageStartupMessages(library(dplyr))
outfile <- file.path(outdir, "qnuminsert.gbc")
x <- insertionSummary(apl)
x <- dplyr::tibble(insertionSummary(apl))
p <- ggplot(x, aes(id)) +
geom_bar(aes(fill=cuts)) +
scale_fill_viridis_d(name="qNumInsert", begin=1, end=0)
Expand All @@ -211,6 +212,10 @@ save_plot(p, outfile)
##--------------------
## Save gbc data
##--------------------
x$revmap <- as.character(x$revmap)
x$hitCoverage <- as.character(x$hitCoverage)
x$hitStart <- as.character(x$hitStart)
x$hitEnd <- as.character(x$hitEnd)
outfile <- file.path(outdir, "gbcdata.tsv.gz")
message("saving ", outfile)
write.table(x, file=gzfile(outfile), row.names=FALSE, sep="\t")
Expand Down Expand Up @@ -313,11 +318,13 @@ p <- ggplot(data=subset(data, n.subjects>1),
outfile <- file.path(outdir, paste0("depth_breadth_seqlengths.mm", mm))
save_plot(p, outfile)


data$revmap <- as.character(data$revmap)
data$hitCoverage <- as.character(data$hitCoverage)
data$hitStart <- as.character(data$hitStart)
data$hitEnd <- as.character(data$hitEnd)
outfile <- file.path(outdir, "gene_body_coverage.csv.gz")
message("saving ", outfile)
write.csv(data, gzfile(outfile), row.names=FALSE)

write.csv(dplyr::tibble(data), gzfile(outfile), row.names=FALSE)

##############################
## Save Rdata of analysis
Expand Down
12 changes: 8 additions & 4 deletions man/geneBodyCoverage.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions man/genecovr-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit ebdf2de

Please sign in to comment.