The folder structure is as follows:
genes
: main folder- genes//: folder per gene. We have three genes:
- AR: we are interested in that
- PLSCR3: short gene but has 13 isoforms
- E2F1: gene with only a single isoform
- genes//:
P000R000
: First ONT flow cell, first run- ``P000R001`: First ONT flow cell, second run
P000R002
: Second ONT flow cell
Then, under each sample folder we have:
gene.fasta
:FASTA
file of the genereads.fasta
:FASTA
file of the real reads that map to the gene's genomic region according tominimap2
whole genome mappingtranscripts.fasta
:FASTA
file of the ENSEMBL transcripts of this gene. The forward strand is shown here.transcripts.tsv
: Each record is a transcript. The columns are:- Transcript ID
- Chromosome (all should be the same)
- Strand on reference
- Comma separated list of intervals for the transcript exons upstream forward strand of the gene
training*
: Files generated byNanoSim
while analyzing the real reads error profile. More details here.simulated_error_profile
andsimulated.log
: files generated byNanoSim
that include log of errors simulated in each read and the overall log.simulated_reads.fasta
: Reads generated byNanoSim
simulated_reads.oriented.fasta
: The sameNanoSim
reads but oriented on the forward strand of the gene/transcriptsimulated_reads.oriented.tsv
: Similar totranscripts.tsv
but for the simulated reads:- Read name
- Transcript ID
- Original strand on transcript/gene
- Comma separated list of intervals for the transcript exons upstream forward strand of the gene
simulated_reads.oriented.paf
: The alignmentPAF
file. StandardPAF
tags are used.oc:c:1
indicates that this alignment is used in the optimal chain.simulated_reads.oriented.dot
:DOT
file from Freddie plottingsimulated_reads.oriented.pdf
:PDF
file from Freddie plotting. View this.simulated_reads.oriented.<read name>.dot
:DOT
file from Freddie plotting with only<read-name>
annotations keptsimulated_reads.oriented.<read name>.pdf
:PDF
file from Freddie plotting with only<read-name>
annotations kept. View this.