Skip to content

Commit

Permalink
Add insertionSummary vignette example and fix gbc factor levels
Browse files Browse the repository at this point in the history
  • Loading branch information
percyfal committed Mar 23, 2022
1 parent 6db7209 commit 331dc8b
Show file tree
Hide file tree
Showing 3 changed files with 27 additions and 6 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: genecovr
Title: Gene body coverage analysis to evaluate genome assemblies
Version: 0.0.0.9012
Version: 0.0.0.9013
Authors@R:
person(given = "Per",
family = "Unneberg",
Expand Down
2 changes: 2 additions & 0 deletions R/methods-AlignmentPairsList-class.R
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,8 @@ setMethod("geneBodyCoverage", signature = c("AlignmentPairsList"),
BPPARAM=bpparam)
}
x <- bind_rows(lapply(gbc, data.frame), .id="id")
if (!is.null(names(apl)))
x[["id"]] <- factor(x[["id"]], levels=names(apl))
x
})

Expand Down
29 changes: 24 additions & 5 deletions vignettes/genecovr.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -134,18 +134,37 @@ plot(apl, aes(x=id, y=get_expr(enquo(cnames))), which="boxplot") + facet_wrap(.

## Plot number of indels

As assembly quality improves, the number of indels go down.
The function `insertionSummary` summarizes the number of insertions,
either at the transcript level (default) or per alignment. The
intuition is that as assembly quality improves, the number of indels
go down.

First we show a plot with the number of insertions per alignment. A
consequence of this is that as a transcript may be split in multiple
alignments, the bars are of unequal height.

```{r gbc-plot-qnuminsert}
x <- as.data.frame(apl)
x$cuts <- cut(x$query.NumInsert, c(-1:3, Inf), include.lowest=FALSE)
levels(x$cuts) <- c(0:3, ">3")
x$cuts <- factor(x$cuts, levels=c(">3", 3:0))
x <- insertionSummary(apl, reduce=FALSE)
ggplot(x, aes(id)) +
geom_bar(aes(fill=cuts)) +
scale_fill_viridis_d(name="qNumInsert", begin=1, end=0)
```

An alternative is to summarize the number of insertions over a
transcript. Currently, no consideration is taken to overlapping
alignments, meaning some insertions may be counted more than once. An
improvement would be to use the non-overlapping set of alignments with
the fewest number of insertions.

```{r gbc-plot-qnuminsert-gbc}
x <- insertionSummary(apl)
ggplot(x, aes(id)) +
geom_bar(aes(fill=cuts)) +
scale_fill_viridis_d(name="qNumInsert", begin=1, end=0)
```



## Gene body coverage


Expand Down

0 comments on commit 331dc8b

Please sign in to comment.