-
Notifications
You must be signed in to change notification settings - Fork 8
Home
PISA is a suite of programs for processing and interacting with single-cell/-molecular high-throughput sequencing fastq and bam file. The idea of PISA is trying to integrate cell barcodes and molecular barcodes (UMIs) into plain fastq files. Users could perform quality control, alignment, or assembly with the current-stat-of-art software for the fastqs. And further, use PISA to parse the barcodes as tag information in the bam file. Users can also use PISA to perform selection, correction, and summary tags of bams. So PISA is flexible and NOT designed for a specific library or platform.
$ git clone https://github.com/shiquan/PISA
$ cd PISA
$ make
Parse and put all barcodes information into the read names of the FASTQ+ files. The FASTQ+ is a variant of the standard FASTQ format, can be used like FASTQ.
PISA parse -config read_structure.json -1 reads.fq -report fastq_report.csv reads_1.fq.gz reads_2.fq.gz
Convert the barcodes in the read name to tags of the SAM file and export into BAM
PISA sam2bam -report alignment_report.csv in.sam -o out.bam
Annotate reads and add annotation tags in the BAM
PISA anno -gtf genes.gtf -o anno.bam -@ 5 -report anno_report.csv aln.bam
Correct barcodes in each block of reads. In this example, the block defined as the reads has the same cell barcode and gene tags.
PISA corr -tag UR -new-tag UB -tags-block CB,GN -cr -o final.bam -@ 5 anno.bam
Count gene expression based on cell barcode, gene and UMI tags.
PISA count -tag CB -anno-tag GN -umi UB -outdir raw_gene_expression -@ 5 final.bam
- Assemble reads original from one molecule;
- Implement new designed and more user-friendly
parse
; - Support loom output (frozen);
Export unspliced matrix for velocity;UpgradePISA parse
for faster process fastqs.
v0.10b 2021/12/09
-
PISA count
now has-velo
option to export unspliced and spliced matrix together. For velocity analysis, remember to use-intron
to annotate reads. -
PISA parse
support multi-threads.
v0.10a 2021/11/06
-
PISA count
support count spliced and unspliced reads. -
PISA count
support count from multiple bam files.
v0.9 2021/10/14
- Rewrite
rmdup
. Not support paired reads for now.
v0.8 2021/07/20
- Reduce memory usage of
count
- Fix region query bug of
anno -bed
- Add
anno -vcf
method
v0.7 2020/11/20
- Introduce the PCR deduplicate method
rmdup
. - Mask read and qual field as * instead of sequence for secondary alignments in the BAM file.
v0.6 2020/10/29
-
PISA attrcnt
, Skip secondary alignments before counting reads -
PISA anno
fix segments fault bugs when loading malformed GTF
v0.5 2020/08/27
- Add
PISA bam2frag
function (experimental). -
PISA anno
Skip secondary alignments when counting total reads.
v0.4 2020/07/14
-
PISA sam2bam
add mapping quality adjustment method; - Rewrite UMI correction index structure to reduce memory use;
- Fix bugs.
v0.4alpha 2020/05/2
-
PISA anno
use UCSC bin scheme instead of linear search for reads query gene regions. Fix the bug of misannotated antisense reads. -
PISA count
use MEX output instead of plain cell vs gene table.