-
Notifications
You must be signed in to change notification settings - Fork 8
Home
PISA is a suite of programs for processing and interacting with single-cell/-molecular high-throughput sequencing fastq and bam file. The idea of PISA is trying to integrate cell barcodes and molecular barcodes (UMIs) into plain fastq files. Users could perform quality control, alignment, or assembly with the current-stat-of-art software for the fastqs. And further, use PISA to parse the barcodes as tag information in the bam file. Users can also use PISA to perform selection, correction, and summary tags of bams. So PISA is flexible and NOT designed for a specific library or platform.
$ git clone https://github.com/shiquan/PISA
$ cd PISA
$ make
Parse and put all barcodes information into the read names of the FASTQ+ files. The FASTQ+ is a variant of the standard FASTQ format, can be used like FASTQ.
PISA parse -config read_structure.json -1 reads.fq -report fastq_report.csv reads_1.fq.gz reads_2.fq.gz
PISA sam2bam -report alignment_report.csv in.sam -o out.bam
PISA anno -gtf refdata-cellranger-GRCh38-3.0.0/genes/genes.gtf -o anno.bam -@ 5 -report anno_report.csv aln.bam
PISA corr -tag UR -new-tag UB -tags-block CB,GN -cr -o final.bam -@ 5 anno.bam
PISA count -tag CB -anno-tag GN -umi UB -outdir raw_gene_expression -@ 5 final.bam
- Support loom output
- Export unspliced matrix for velocity
- Upgrade
PISA parse
for faster process fastqs.
-
v0.10a 2021/11/06
-
PISA count
support count spliced and unspliced reads by using-ttype
option. -
PISA count
support count from multiple bam files.
-
-
v0.9 2021/10/14
- Rewrite
rmdup
. Not support paired reads for now.
- Rewrite
-
v0.8 2021/07/20
- Reduce memory usage of
count
- Fix region query bug of
anno -bed
- Add
anno -vcf
method
- Reduce memory usage of
-
v0.7 2020/11/20
- Introduce the PCR deduplicate method
rmdup
. - Mask read and qual field as * instead of sequence for secondary alignments in the BAM file.
- Introduce the PCR deduplicate method
-
v0.6 2020/10/29
-
PISA attrcnt
, Skip secondary alignments before counting reads -
PISA anno
fix segments fault bugs when loading malformed GTF
-
-
v0.5 2020/08/27
- Add
PISA bam2frag
function (experimental). -
PISA anno
Skip secondary alignments when counting total reads.
- Add
-
v0.4 2020/07/14
-
PISA sam2bam
add mapping quality adjustment method - rewrite UMI correction index structure to reduce memory use
- Fix bugs.
-
-
v0.4alpha 2020/05/2
-
PISA anno
use UCSC bin scheme instead of linear search for reads query gene regions. Fix the bug of misannotated antisense reads. -
PISA count
use MEX output instead of plain cell vs gene table.
-
-
v0.3 2020/03/26
- Fix bugs and improve preformance.
-
0.0.0.9999 2019/05/19
- Init.