Documentation updates

NBISweden · Dec 21, 2023 · 3785127 · 3785127
1 parent 9de4e6b
commit 3785127
Show file tree

Hide file tree

Showing 5 changed files with 104 additions and 21 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: genecovr
 Title: Gene body coverage analysis to evaluate genome assemblies
-Version: 0.1.0
+Version: 0.1.1
 Authors@R:
     person(given = "Per",
            family = "Unneberg",

diff --git a/NEWS.md b/NEWS.md
@@ -1,22 +1,33 @@
-# Release 0.0.0.9013
+<!-- markdownlint-disable MD025 -->
+
+# genecovr 0.1.1
+
+- update README
+- add Empirical studies section
+
+# genecovr 0.1.0
+
+- add pkgdown site
+
+# genecovr 0.0.0.9013
 
 - fix factor level ordering for geneBodyCoverage plot
 - save geneBodyCoverage as tsv
 
-# Release 0.0.0.9012
+# genecovr 0.0.0.9012
 
 - adjust factor levels for number of inserts (#4)
 - summarize number of inserts by transcript (#5)
 
-# Release 0.0.0.9011
+# genecovr 0.0.0.9011
 
 - fix order of factors
 
-# Release 0.0.0.9010
+# genecovr 0.0.0.9010
 
 - remove duplicate entries in psl input
 
-# Release 0.0.0.9009
+# genecovr 0.0.0.9009
 
 - add plot of transcript length distributions conditioned on number of
   mapped contigs
@@ -26,21 +37,21 @@
   DataFrame inputs, obviating the need to rerun geneBodyCoverage
   multiple times in genecovr script
 
-
-# Release 0.0.0.9008
+# genecovr 0.0.0.9008
 
 - Remove characters trailing first space in fasta headers
 
-# Release 0.0.0.9007
+# genecovr 0.0.0.9007
 
 - Fix conversion of DNAStringSet to Seqinfo
 - Make sure geneBodyCoverage table has nmax levels
 
-
-# Release 0.0.0.9006
+# genecovr 0.0.0.9006
 
 - add depthOfCoverage function and analysis to vignette and script
 - reduceHitCoverage is deprecated
 - improve some docs
 - add wrapper for saving plots
 - add tests mainly for alignmentpairs and test setup
+
+<!-- markdownlint-enable MD025 -->
diff --git a/README.Rmd b/README.Rmd
@@ -40,6 +40,10 @@ GitHub](https://github.com/nbis) with:
 devtools::install_github("NBISweden/genecovr")
 ```
 
+The tool has been developed and tested on GNU/Linux systems but should
+work on any system that runs `R`. Installation is expected to take at
+most a couple of minutes.
+
 ## Usage
 
 ### genecovr script quick start

diff --git a/vignettes/empirical.Rmd b/vignettes/empirical.Rmd
@@ -0,0 +1,66 @@
+---
+title: "Empirical studies"
+author: "Per Unneberg"
+date: "`r Sys.Date()`"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Empirical studies}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+biblio-style: plain
+bibliography: bibliography.bib
+---
+
+# Northern krill
+
+`genecovr` was used to assess the quality metrics of the Northern
+krill genome.
+
+To test genecovr with the 19 Gb Northern krill genome and gene data
+(16,509 transcripts of protein coding genes), access the collection in
+the SciLifeLab Data Repository named "Ecological genomics of the
+Northern krill" using the following permanent link:
+
+< URL to be provided >
+
+1. Genome file
+
+Access item: 1. Ecological genomics of the Northern krill: Genome
+assembly DNA sequences
+
+Download: northern_krill.genome_assembly.tar.gz
+
+Extract genome assembly for evaluation:
+1.m_norvegica.main_w_mito.fasta
+
+2. Gene models
+
+Access item: 3. Ecological genomics of the Northern krill: Genome
+assembly annotations (genes and repeats)
+
+Download: trinity_transcript.16509_single_isoforms.cds.fasta.tar.gz
+
+Extract and use transcripts for evaluation:
+trinity_transcript.16509_single_isoforms.cds.fasta
+
+3. gmap alignment
+
+Map transcripts to assembly with gmap:
+
+    # Build index
+	gmap_build --genomedb mnorvegica 1.m_norvegica.main_w_mito.fasta
+    # Map with gmap; format=1 -> psl output
+	gmap -t 12 --dir . --db mnorvegica --format 1 trinity_transcript.16509_single_isoforms.cds.fasta > mnorvegica.psl
+
+4. genecovr input file
+
+Generate a comma-separated file, assemblies.csv, with the following contents:
+
+	main,mnorvegica.psl,1.m_norvegica.main_w_mito.fasta,trinity_transcript.16509_single_isoforms.cds.fasta
+
+and run
+
+	genecovr assemblies.csv
+
+This will generate a number of summary data files along with png and
+pdf plots based on the summary data.
diff --git a/vignettes/genecovr.Rmd b/vignettes/genecovr.Rmd
@@ -6,7 +6,7 @@ output: rmarkdown::html_vignette
 vignette: >
   %\VignetteIndexEntry{Gene body coverage analysis in R}
   %\VignetteEngine{knitr::rmarkdown}
-  \usepackage[utf8]{inputenc}
+  %\VignetteEncoding{UTF-8}
 biblio-style: plain
 bibliography: bibliography.bib
 ---
@@ -71,15 +71,17 @@ are `GenomicRanges::GRanges` objects or objects derived from the
 # Analysing gene body coverage
 
 In this section we analyse the mapping of a transcriptome to a
-non-polished and polished assembly. The mapping results consist of two
-gmap files in psl format, `transcripts2nonpolished.psl` and
-`transcripts2polished.psl`. In addition there are fasta index files
-for both assemblies (`nonpolished.fai` and `polished.fai`) and for the
-transcriptome (`transcripts.fai`). The fasta indices are used to
-generate `GenomeInfoDb::Seqinfo` objects that can be used to set
-sequence information on the parsed output. We load the fasta indices
-and parse the psl files with `genecovr::readPsl`, storing the results
-in an `genecovr::AlignmentPairsList` for convenience.
+non-polished and polished assembly, using example data. The entire
+analysis takes less than 5 minutes to execute using these datasets.
+The mapping results consist of two gmap files in psl format,
+`transcripts2nonpolished.psl` and `transcripts2polished.psl`. In
+addition there are fasta index files for both assemblies
+(`nonpolished.fai` and `polished.fai`) and for the transcriptome
+(`transcripts.fai`). The fasta indices are used to generate
+`GenomeInfoDb::Seqinfo` objects that can be used to set sequence
+information on the parsed output. We load the fasta indices and parse
+the psl files with `genecovr::readPsl`, storing the results in an
+`genecovr::AlignmentPairsList` for convenience.
 
 ``` {r gbc-load-data}
 assembly_fai_fn <- list(