-
Notifications
You must be signed in to change notification settings - Fork 20
Intermediates and Filtering
Keiran Raine edited this page Jan 17, 2022
·
5 revisions
After the initial group calling the BRASS flow progressively filters the groups down via several methods.
Some of the intermediate files are available in *.intermediates.tar.gz
ordered as follows:
-
.groups.gz
- raw brass group output .groups.filtered.bedpe
.groups.clean.bedpe
This is the raw output from the grouping algorithm:
- cols 1-4 LOW chr, strand, start, stop
- cols 5-8 HIGH chr, strand, start, stop
- N columns of read counts from each sample (see #NSAMPLES in header) ordered as per the header
- Hits repeat (
.
unless repeat filtering was used) - N columns of read NAMES from each sample (see #NSAMPLES in header) ordered as per the header
The *.brass.intermediates.tar.gz
contains files that are useful for debugging and deep investigation.
Content as displayed via tar ztf *.brass.intermediates.tar.gz
.
- ${T} = Tumour
- ${N} = Normal
Many files are extended versions of the window GC reference input bed file (tagged as WGC-fmt
in table). Format as follows:
Column | Description |
---|---|
1 | Chromosome/contig |
2 | start (0-based) |
3 | end (1-based) |
4 | b.p. of non N sequence in window |
5 | Fraction of bases GC (gc_bp / non_n_bp ), NA when col-4 is 0 |
File listing, ordered by creation:
File | Description |
---|---|
intermediates/samp_stats.txt |
Purity/ploidy, male/female status. Inputs provided at execution |
intermediates/${T}_vs_${N}.groups.gz |
Primary grouping with normal panel filtering |
intermediates/${T}_vs_${N}.groups.filtered.bedpe |
Groups passing basic blat and read support filtering. |
intermediates/${T}.insert_size_distr |
Corrected insert size distribution using samtools view -f 66 -F 3868 as filter based on chr5/5 |
intermediates/${T}_vs_${N}.ngscn.abs_cn.bg.gz |
BedGraph version of absolute copynumber |
intermediates/${T}_vs_${N}.ngscn.segments.abs_cn.bg.gz |
Segmented absolute copynumber |
intermediates/${N}.ngscn.bed.gz |
Normal: WGC-fmt + count of properly paired reads |
intermediates/${N}.ngscn.fb_reads.bed.gz |
Normal: As intermediates/${N}.ngscn.bed.gz + reads on same strand and contig (foldback). |
intermediates/${T}.ngscn.bed.gz |
Tumour: WGC-fmt + count of properly paired reads |
intermediates/${T}.ngscn.fb_reads.bed.gz |
Tumour: As intermediates/${N}.ngscn.bed.gz + reads on same strand and contig (foldback). |
intermediates/${T}_vs_${N}.is_fb_artefact.txt |
List of event IDs considered to be fold-back artefacts (metropolis_hastings_inversions.R ). |
intermediates/${T}_vs_${N}.r2 |
filter_small_deletions_and_fb_artefacts.R |
intermediates/${T}_vs_${N}.r3 |
Identifies groups that should be merged |
intermediates/${T}_vs_${N}.r4 |
Corrects breackpoints using clipped reads (get_abs_bkpts_from_clipped_reads.pl ) |
intermediates/${T}_vs_${N}.r5[.scores] |
Filter events due to microbial or viral sequences (filter_with_microbes_and_remapping.pl ) |
intermediates/${T}_vs_${N}.ngscn.abs_cn.bg.rg_cns.gz |
Combined segmentation of tumour/normal taking into account cent/telo and purity data. (get_rg_cns.R ) |
intermediates/${T}_vs_${N}.r6 |
Adds flag where event hits a copy-number change as defined in intermediates/${T}_vs_${N}.ngscn.abs_cn.bg.rg_cns.gz
|
intermediates/${T}_vs_${N}.cn_filtered |
Filtered version of r5 , downstream input |
intermediates/${T}_vs_${N}.groups.clean.bedpe |
Annotates translocations, occurrences, copynumber changepoints, L v H range blat scores |
intermediates/${T}_vs_${N}.inversions.pdf |
Plot showing inversions, insert size distribution and bad groupings. Intended for debugging. |