README.Rmd

---
title: "Posterior predictive check for bulk RNA sequencing data"
output: github_document
---


The input data set is a tidy representation of a differential gene transcript abundance analysis

```{r echo=FALSE, include=FALSE}
library(tidyverse)
library(ppcSeq)
```

To install:

For linux systems, in order to exploit multi-threading, from R write:

```{r}
fileConn<-file("~/.R/Makevars")
writeLines(c( "CXX14FLAGS += -O3","CXX14FLAGS += -DSTAN_THREADS", "CXX14FLAGS += -pthread"), fileConn)
close(fileConn)
```

Then, install with

```{r  eval=FALSE}
devtools::install_github("stemangiola/ppcSeq")
```

You can get the test dataset with

```{r eval=FALSE}
ppcSeq::counts 
```

You can convert a list of BAM/SAM files into a tidy data frame of annotated counts

```{r warning=FALSE, message=FALSE}
counts.ppc = 
	ppcSeq::counts %>%
	mutate(is_significant = FDR < 0.01) %>%
	ppc_seq(
		formula = ~ Label,
		significance_column = PValue,
		do_check_column = is_significant,
		value_column = value,
		percent_false_positive_genes = "5%"
	)
```

The new posterior predictive check has been added to the original data frame

```{r }
counts.ppc 
```

The new data frame contains plots for each gene

We can visualise the top five differentially transcribed genes

```{r }
counts.ppc %>% 
	slice(1:2) %>% 
	pull(plot) %>% 
	cowplot::plot_grid(plotlist = ., align = "v", ncol = 1, axis="b", rel_widths = 1 )

```