Revert "Update documentation links"

This reverts commit 4dd96fd.
NBISweden · Dec 19, 2023 · 9fb7e4d · 9fb7e4d
1 parent 4dd96fd
commit 9fb7e4d
Show file tree

Hide file tree

Showing 2 changed files with 149 additions and 24 deletions.
diff --git a/README.Rmd b/README.Rmd
@@ -0,0 +1,127 @@
+---
+output: github_document
+---
+
+<!-- README.md is generated from README.Rmd with devtools::build_readme(). Please edit that file -->
+
+```{r, include = FALSE}
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  comment = "#>",
+  fig.path = "man/figures/README-",
+  out.width = "100%"
+)
+```
+
+# genecovr
+
+<!-- badges: start -->
+[![R build status](https://github.com/NBISweden/genecovr/workflows/R-CMD-check/badge.svg)](https://github.com/NBISweden/genecovr/actions)
+<!-- badges: end -->
+
+`genecovr` is an `R` package that provides plotting functions that
+summarize gene transcript to genome alignments. The main purpose is to
+assess the effect of polishing and scaffolding operations has on the
+quality of a genome assembly. The gene transcript set is a large
+sequence set consisting of assembled transcripts from RNA-seq data
+generated in relation to a genome assembly project. Therefore,
+`genecovr` serves as a complement to software such as
+[BUSCO](https://busco.ezlab.org/), which evaluates genome assembly
+quality using a smaller set of well-defined single-copy orthologs.
+
+## Installation
+
+You can install the released version of genecovr from [NBIS
+GitHub](https://github.com/nbis) with:
+
+``` r
+# If necessary, uncomment to install devtools
+# install.packages("devtools")
+devtools::install_github("NBISweden/genecovr")
+```
+
+## Usage
+
+### genecovr script quick start
+
+There is a helper script for generating basic plots located in
+PACKAGE_DIR/bin/genecovr. Create a data input csv-delimited file with
+columns
+
+1. data label
+2. mapping file (supported formats: psl)
+3. assembly file (fasta or fasta index)
+4. transcript file (fasta or fasta index)
+
+Columns 3 and 4 can be set to missing value (NA) in which case
+sequence sizes will be inferred from the alignment files. Then run the
+script to generate plots:
+
+``` shell
+PACKAGE_DIR/bin/genecovr indata.csv
+```
+
+#### Example
+
+There are example files located in PACKAGE_DIR/inst/extdata consisting
+of two psl alignment files containing gmap alignments and fasta
+indices for the transcript sequences and two for different assembly
+versions:
+
+- nonpolished.fai - fasta index for raw assembly
+- polished.fai - fasta index for polished assembly
+- transcripts.fai - fasta index for transcript sequences
+- transcripts2nonpolished.psl - gmap alignments, transcripts to raw assembly
+- transcripts2polished.psl - gmap alignments, transcripts to polished
+  assembly
+
+Using these files and the labels `non` and `pol` for the different
+assemblies, a `genecovr` input file (called e.g., `assemblies.csv`)
+would look as follows:
+
+```
+nonpol,transcripts2nonpolished.psl,nonpolished.fai,transcripts.fai
+pol,transcripts2polished.psl,polished.fai,transcripts.fai
+```
+
+and the command to run would be:
+
+```
+genecovr assemblies.csv
+```
+
+#### genecovr options
+
+To list genecovr script options, type 'genecovr -h`:
+
+```
+usage: genecovr [-h] [-v] [-p number]
+                             [-d OUTPUT_DIRECTORY] [--height HEIGHT]
+                             [--width WIDTH]
+                             csvfile
+
+positional arguments:
+  csvfile               csv-delimited file with columns
+                            1. data label
+                            2. mapping file (supported formats: psl)
+                            3. assembly file (fasta or fasta index)
+                            4. transcript file (fasta or fasta index)
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -v, --verbose         print extra output
+  -p number, --cpus number
+                        number of cpus [default 1]
+  -d OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
+                        output directory
+  --height HEIGHT       figure height in inches [default 6.0]
+  --width WIDTH         figure width in inches [default 6.0]
+```
+
+
+
+### R package vignette
+
+Alternatively, import the library in an R script and use the package
+functions. See [Get started](articles/genecovr.html) or run
+`vignette("genecovr")` for a minimum working example.
diff --git a/vignettes/genecovr.Rmd b/vignettes/genecovr.Rmd
@@ -58,14 +58,13 @@ psize <- 3
 ## On object representation of pairwise sequence alignments
 
 `genecovr` has functionality to read pairwise sequence alignment files
-and converts the pairwise alignments to `genecovr::AlignmentPairs`
-objects. An `AlignmentPairs` object is a subclass of the Bioconductor
-class `S4Vectors::Pairs`. A `Pairs` object in turn aligns two vectors
-along slot names `first` and `second`, and the `AlignmentPairs` object
-adds slots for the query and subject, and possibly extra slots related
-to additional information in the alignment file. The query and subject
-are `GenomicRanges::GRanges` objects or objects derived from the
-`GRanges`class.
+and converts the pairwise alignments to `AlignmentPairs` objects. An
+`AlignmentPairs` object is a subclass of the Bioconductor class
+`S4Vectors::Pairs`. A `Pairs` object in turn aligns two vectors along
+slot names `first` and `second`, and the `AlignmentPairs` object adds
+slots for the query and subject, and possibly extra slots related to
+additional information in the alignment file. The query and subject
+are `GRanges` objects or objects derived from the `GRanges`class.
 
 
 # Analysing gene body coverage
@@ -76,10 +75,10 @@ gmap files in psl format, `transcripts2nonpolished.psl` and
 `transcripts2polished.psl`. In addition there are fasta index files
 for both assemblies (`nonpolished.fai` and `polished.fai`) and for the
 transcriptome (`transcripts.fai`). The fasta indices are used to
-generate `GenomeInfoDb::Seqinfo` objects that can be used to set
-sequence information on the parsed output. We load the fasta indices
-and parse the psl files with `genecovr::readPsl`, storing the results
-in an `genecovr::AlignmentPairsList` for convenience.
+generate `Seqinfo` objects that can be used to set sequence
+information on the parsed output. We load the fasta indices and parse
+the psl files with `readPsl`, storing the results in an
+`AlignmentPairsList` for convenience.
 
 ``` {r gbc-load-data}
 assembly_fai_fn <- list(
@@ -158,10 +157,10 @@ plot(apl, aes(x = id, y = get_expr(enquo(cnames))), which = "boxplot") +
 
 ## Plot number of indels
 
-The function `genecovr::insertionSummary` summarizes the number of
-insertions, either at the transcript level (default) or per alignment.
-The intuition is that as assembly quality improves, the number of
-indels go down.
+The function `insertionSummary` summarizes the number of insertions,
+either at the transcript level (default) or per alignment. The
+intuition is that as assembly quality improves, the number of indels
+go down.
 
 First we show a plot with the number of insertions per alignment. A
 consequence of this is that as a transcript may be split in multiple
@@ -189,18 +188,17 @@ ggplot(x, aes(id)) +
 
 ## Gene body coverage
 
-The function `genecovr::geneBodyCoverage` takes an `AlignmentPairs`
-object and summarizes breadth of coverage and number of subject hits
-per transcript.
+The function `geneBodyCoverage` takes an `AlignmentPairs` object and
+summarizes breadth of coverage and number of subject hits per
+transcript.
 
 ``` {r gbc-make-gene-body-coverage}
 gbc <- lapply(apl, geneBodyCoverage, min.match = 0.1)
 ```
 
-A summary can be obtained with the
-`genecovr::summarizeGeneBodyCoverage` function. We define a range of
-minimum match hit cutoffs to filter out hits with too few matches in
-the aligned region.
+A summary can be obtained with the `summarizeGeneBodyCoverage`
+function. We define a range of minimum match hit cutoffs to filter out
+hits with too few matches in the aligned region.
 
 ``` {r gbc-summarize-gene-body-coverage}
 min.match <- c(0.25, 0.5, 0.75, 0.9)
@@ -239,7 +237,7 @@ ggplot(
 
 A fragmented assembly will lead to more transcripts mapping to several
 contigs. We calculate the number of subjects by coverage cutoff with
-the function `genecovr::countSubjectsByCoverage`
+the function `countSubjectsByCoverage`
 
 ```{r gbc-ncontigs-per-transcript}
 data <- dplyr::bind_rows(