Home

PISA is a suite of programs for processing and interacting with single-cell/-molecular high-throughput sequencing fastq and bam file. The idea of PISA is trying to integrate cell barcodes and molecular barcodes (UMIs) into plain fastq files. Users could perform quality control, alignment, or assembly with the current-stat-of-art software for the fastqs. And further, use PISA to parse the barcodes as tag information in the bam file. Users can also use PISA to perform selection, correction, and summary tags of bams. So PISA is flexible and NOT designed for a specific library or platform.

INSTALL

$ git clone https://github.com/shiquan/PISA
$ cd PISA
$ make

SYNOPSIS

Parse and put all barcodes information into the read names of the FASTQ+ files. The FASTQ+ is a variant of the standard FASTQ format, can be used like FASTQ.

PISA parse -config read_structure.json -1 reads.fq -report fastq_report.csv reads_1.fq.gz reads_2.fq.gz

Convert the barcodes in the read name to tags of the SAM file and export into BAM

PISA sam2bam -report alignment_report.csv in.sam -o out.bam

Annotate reads and add annotation tags in the BAM

PISA anno -gtf genes.gtf -o anno.bam -@ 5 -report anno_report.csv aln.bam

Correct barcodes in each block of reads. In this example, the block defined as the reads has the same cell barcode and gene tags.

PISA corr -tag UR -new-tag UB -tags-block CB,GN -cr -o final.bam -@ 5 anno.bam

Count gene expression based on cell barcode, gene and UMI tags.

PISA count -tag CB -anno-tag GN -umi UB -outdir raw_gene_expression -@ 5 final.bam

TODO list

Assemble reads original from one molecule;
Implement new designed and more user-friendly parse;
Support loom output (frozen);
~~Export unspliced matrix for velocity;~~
~~Upgrade PISA parse for faster process fastqs.~~

CHANGLOG

v0.10b 2021/12/09

PISA count now has -velo option to export unspliced and spliced matrix together. For velocity analysis, remember to use -intron to annotate reads.
PISA parse support multi-threads.

v0.10a 2021/11/06

PISA count support count spliced and unspliced reads.
PISA count support count from multiple bam files.

v0.9 2021/10/14

Rewrite rmdup. Not support paired reads for now.

v0.8 2021/07/20

Reduce memory usage of count
Fix region query bug of anno -bed
Add anno -vcf method

v0.7 2020/11/20

Introduce the PCR deduplicate method rmdup.
Mask read and qual field as * instead of sequence for secondary alignments in the BAM file.

v0.6 2020/10/29

PISA attrcnt, Skip secondary alignments before counting reads
PISA anno fix segments fault bugs when loading malformed GTF

v0.5 2020/08/27

Add PISA bam2frag function (experimental).
PISA anno Skip secondary alignments when counting total reads.

v0.4 2020/07/14

PISA sam2bam add mapping quality adjustment method;
Rewrite UMI correction index structure to reduce memory use;
Fix bugs.

v0.4alpha 2020/05/2

PISA anno use UCSC bin scheme instead of linear search for reads query gene regions. Fix the bug of misannotated antisense reads.
PISA count use MEX output instead of plain cell vs gene table.

v0.3 2020/03/26

Fix bugs and improve preformance.

0.0.0.9999 2019/05/19

Init version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

INSTALL

SYNOPSIS

TODO list

CHANGLOG

Clone this wiki locally