Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tests and documentation for 0.1.0 release #7

Merged
merged 9 commits into from
Dec 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,7 @@
^scripts$
^doc$
^Meta$
^_pkgdown\.yml$
^docs$
^pkgdown$
^tests/testthat.R$
29 changes: 14 additions & 15 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,18 @@ name: R-CMD-check

jobs:
R-CMD-check:
runs-on: macOS-latest
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: r-lib/actions/setup-r@master
- uses: r-lib/actions/setup-pandoc@master
- name: Install dependencies
run: |
install.packages(c("remotes", "rcmdcheck", "knitr"))
deps <- remotes::dev_package_deps(dependencies = TRUE)
install.packages(deps$package[!is.na(deps$available)])
if (!requireNamespace("BiocManager", quietly = TRUE)) {install.packages("BiocManager")}
BiocManager::install(deps$package[is.na(deps$available)])
shell: Rscript {0}
- name: Check
run: rcmdcheck::rcmdcheck(args = "--no-manual", error_on = "error")
shell: Rscript {0}
- uses: actions/checkout@v3
- uses: r-lib/actions/setup-r@v2
- uses: r-lib/actions/setup-pandoc@v2
with:
pandoc-version: '2.17.1'
- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::rcmdcheck devtools
needs: check
- uses: r-lib/actions/check-r-package@v2
with:
args: 'c("--no-manual")'
error-on: '"error"'
48 changes: 48 additions & 0 deletions .github/workflows/pkgdown.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
branches: [main, master]
pull_request:
branches: [main, master]
release:
types: [published]
workflow_dispatch:

name: pkgdown

jobs:
pkgdown:
runs-on: ubuntu-latest
# Only restrict concurrency for non-PR jobs
concurrency:
group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }}
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
permissions:
contents: write
steps:
- uses: actions/checkout@v3

- uses: r-lib/actions/setup-pandoc@v2

- uses: r-lib/actions/setup-r@v2
with:
use-public-rspm: true

- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::pkgdown, local::.
needs: website

- name: Build site
run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE)
shell: Rscript {0}

- name: Deploy to GitHub pages 🚀
if: github.event_name != 'pull_request'
uses: JamesIves/[email protected]
with:
clean: false
branch: gh-pages
folder: docs
2 changes: 2 additions & 0 deletions .lintr
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
linters: linters_with_defaults(
indentation_linter(hanging_indent_style="tidy"))
52 changes: 52 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: check-merge-conflict
- id: debug-statements
- id: mixed-line-ending
- id: detect-private-key
- id: check-case-conflict
- id: check-yaml
- id: trailing-whitespace
- repo: https://github.com/DavidAnson/markdownlint-cli2
rev: v0.11.0
hooks:
- id: markdownlint-cli2
files: \.(md|qmd)$
types: [file]
exclude: LICENSE.md
- id: markdownlint-cli2-fix
files: \.(md|qmd)$
types: [file]
exclude: LICENSE.md
- repo: https://github.com/lorenzwalthert/precommit
rev: v0.3.2.9027
hooks:
- id: style-files
name: style-files
description: style files with {styler}
entry: Rscript inst/hooks/exported/style-files.R
language: r
files: '(\.[rR]profile|\.[rR]|\.[rR]md|\.[rR]nw|\.[qQ]md)$'
exclude: |
(?x)^(
renv/activate\.R|
)$
minimum_pre_commit_version: "2.13.0"
- id: parsable-R
name: parsable-R
description: check if a .R file is parsable
entry: Rscript inst/hooks/exported/parsable-R.R
language: r
files: '\.[rR](md)?$'
minimum_pre_commit_version: "2.13.0"
- id: lintr
args: [--warn_only]
name: lintr
description: check if a `.R` file is lint free (using {lintr})
entry: Rscript inst/hooks/exported/lintr.R
language: r
files: '(\.[rR]profile|\.R|\.Rmd|\.Rnw|\.r|\.rmd|\.rnw)$'
exclude: 'renv/activate\.R'
minimum_pre_commit_version: "2.13.0"
9 changes: 4 additions & 5 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: genecovr
Title: Gene body coverage analysis to evaluate genome assemblies
Version: 0.0.0.9013
Authors@R:
Version: 0.1.0
Authors@R:
person(given = "Per",
family = "Unneberg",
role = c("aut", "cre"),
Expand All @@ -14,9 +14,7 @@ License: GPL-3
Encoding: UTF-8
LazyData: true
Imports:
BiocGenerics,
BiocParallel,
Biostrings,
GenomeInfoDb,
GenomicRanges (>= 1.32.0),
IRanges,
Expand All @@ -36,4 +34,5 @@ Suggests:
VignetteBuilder:
knitr
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.2
RoxygenNote: 7.2.3
URL: https://nbisweden.github.io/genecovr/
80 changes: 74 additions & 6 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,21 +19,30 @@ knitr::opts_chunk$set(
[![R build status](https://github.com/NBISweden/genecovr/workflows/R-CMD-check/badge.svg)](https://github.com/NBISweden/genecovr/actions)
<!-- badges: end -->

Perform gene body coverage analyses in R to evaluate genome assembly
quality.
`genecovr` is an `R` package that provides plotting functions that
summarize gene transcript to genome alignments. The main purpose is to
assess the effect of polishing and scaffolding operations has on the
quality of a genome assembly. The gene transcript set is a large
sequence set consisting of assembled transcripts from RNA-seq data
generated in relation to a genome assembly project. Therefore,
`genecovr` serves as a complement to software such as
[BUSCO](https://busco.ezlab.org/), which evaluates genome assembly
quality using a smaller set of well-defined single-copy orthologs.

## Installation

You can install the released version of genecovr from [NBIS
GitHub](https://github.com/nbis) with:

``` r
# If necessary, uncomment to install devtools
# install.packages("devtools")
devtools::install_github("NBISweden/genecovr")
```

## Usage

## Quick usage
### genecovr script quick start

There is a helper script for generating basic plots located in
PACKAGE_DIR/bin/genecovr. Create a data input csv-delimited file with
Expand All @@ -52,8 +61,67 @@ script to generate plots:
PACKAGE_DIR/bin/genecovr indata.csv
```

## Vignette
#### Example

Alternatively, import the library as usual in an R script and use the
package functions. See the vignette for a minimum working example.
There are example files located in PACKAGE_DIR/inst/extdata consisting
of two psl alignment files containing gmap alignments and fasta
indices for the transcript sequences and two for different assembly
versions:

- nonpolished.fai - fasta index for raw assembly
- polished.fai - fasta index for polished assembly
- transcripts.fai - fasta index for transcript sequences
- transcripts2nonpolished.psl - gmap alignments, transcripts to raw assembly
- transcripts2polished.psl - gmap alignments, transcripts to polished
assembly

Using these files and the labels `non` and `pol` for the different
assemblies, a `genecovr` input file (called e.g., `assemblies.csv`)
would look as follows:

```
nonpol,transcripts2nonpolished.psl,nonpolished.fai,transcripts.fai
pol,transcripts2polished.psl,polished.fai,transcripts.fai
```

and the command to run would be:

```
genecovr assemblies.csv
```

#### genecovr options

To list genecovr script options, type 'genecovr -h`:

```
usage: genecovr [-h] [-v] [-p number]
[-d OUTPUT_DIRECTORY] [--height HEIGHT]
[--width WIDTH]
csvfile

positional arguments:
csvfile csv-delimited file with columns
1. data label
2. mapping file (supported formats: psl)
3. assembly file (fasta or fasta index)
4. transcript file (fasta or fasta index)

optional arguments:
-h, --help show this help message and exit
-v, --verbose print extra output
-p number, --cpus number
number of cpus [default 1]
-d OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
output directory
--height HEIGHT figure height in inches [default 6.0]
--width WIDTH figure width in inches [default 6.0]
```



### R package vignette

Alternatively, import the library in an R script and use the package
functions. See [Get started](articles/genecovr.html) or run
`vignette("genecovr")` for a minimum working example.
9 changes: 9 additions & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
url: https://nbisweden.github.io/genecovr/
template:
bootstrap: 5

reference:
- title: genecovr
- contents:
- matches(".*")

19 changes: 13 additions & 6 deletions inst/bin/genecovr
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@ apl <- AlignmentPairsList(
seqinfo.query=transcripts.sinfo[[x]])
}, BPPARAM=bpparam)
)

names(apl) <- names(psl.fn)

##------------------------------
Expand Down Expand Up @@ -183,8 +184,9 @@ save_plot(p, outfile)


## FIXME: number of levels should be parametrized via option
suppressPackageStartupMessages(library(dplyr))
outfile <- file.path(outdir, "qnuminsert")
x <- insertionSummary(apl, reduce=FALSE)
x <- dplyr::tibble(insertionSummary(apl, reduce=FALSE))
p <- ggplot(x, aes(id)) +
geom_bar(aes(fill=cuts)) +
scale_fill_viridis_d(name="qNumInsert", begin=1, end=0)
Expand All @@ -199,9 +201,8 @@ message("saving ", outfile)
write.csv(x, file=gzfile(outfile), row.names=FALSE)

## Also make plot based on gbc
suppressPackageStartupMessages(library(dplyr))
outfile <- file.path(outdir, "qnuminsert.gbc")
x <- insertionSummary(apl)
x <- dplyr::tibble(insertionSummary(apl))
p <- ggplot(x, aes(id)) +
geom_bar(aes(fill=cuts)) +
scale_fill_viridis_d(name="qNumInsert", begin=1, end=0)
Expand All @@ -211,6 +212,10 @@ save_plot(p, outfile)
##--------------------
## Save gbc data
##--------------------
x$revmap <- as.character(x$revmap)
x$hitCoverage <- as.character(x$hitCoverage)
x$hitStart <- as.character(x$hitStart)
x$hitEnd <- as.character(x$hitEnd)
outfile <- file.path(outdir, "gbcdata.tsv.gz")
message("saving ", outfile)
write.table(x, file=gzfile(outfile), row.names=FALSE, sep="\t")
Expand Down Expand Up @@ -313,11 +318,13 @@ p <- ggplot(data=subset(data, n.subjects>1),
outfile <- file.path(outdir, paste0("depth_breadth_seqlengths.mm", mm))
save_plot(p, outfile)


data$revmap <- as.character(data$revmap)
data$hitCoverage <- as.character(data$hitCoverage)
data$hitStart <- as.character(data$hitStart)
data$hitEnd <- as.character(data$hitEnd)
outfile <- file.path(outdir, "gene_body_coverage.csv.gz")
message("saving ", outfile)
write.csv(data, gzfile(outfile), row.names=FALSE)

write.csv(dplyr::tibble(data), gzfile(outfile), row.names=FALSE)

##############################
## Save Rdata of analysis
Expand Down
12 changes: 8 additions & 4 deletions man/geneBodyCoverage.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions man/genecovr-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading