Skip to content

ZhikunWu/Bioinformatic-resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bioinformatic-resources

Trio & family

tools detecting de novo SNV/InDel

  • PolyMutt
  • DeNovoGear
  • FamSeq
  • DNMFilter
  • TrioDeNovo
  • Scalpel
  • mirTrios
  • VarScan
  • TrioCaller
  • SeqHBase

Quality control

  • adapterremoval: rapid adapter trimming, identification, and read merging

Tools with bam

  • Alfred: BAM alignment statistics, feature counting and feature annotation
  • bamkit: Tools for common BAM file manipulations
  • bam-readcount: count DNA sequence reads in BAM files
  • bamtools
  • biobambam2: Tools for early stage alignment file processing
  • mosdepth: fast BAM/CRAM depth calculation for WGS, exome, or targetted sequencing.
  • VariantBam: Filtering and profiling of next-generational sequencing data using region-specific rules

Alignment (Illumina)

  • bwa: Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)

fusion gene

  • arriba: Fast and accurate gene fusion detection from RNA-Seq data
  • FuSeq: A fast detection of fusion genes from paired-end RNA-seq data
  • GeneFuse: Gene fusion detection and visualization
  • fusioncatcher: Finder of Somatic Fusion Genes in RNA-seq data
  • STAR-Fusion: STAR-Fusion codebase
  • STAR-Fusion-Tutorial: Tutorial for STAR-Fusion, FusionInspector, and de novo reconstruction of fusion transcripts using Trinity

16S rRNA resources

data format

16S rRNA gene database

  • RDP: RDP provides quality-controlled, aligned and annotated Bacterial and Archaeal 16S rRNA sequences
  • SILVA: SILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences
  • GreenGene: Greengenes is a quality controlled, comprehensive 16S reference database and taxonomy based on a de novo phylogeny that provides standard operational taxonomic unit sets
  • rrnDB: A searchable database documenting variation in ribosomal RNA operons (rrn) in Bacteria and Archaea
  • EzTaxon-e: It contains comprehensive 16S rRNA gene sequences of taxa with valid names as well as sequences of uncultured taxa

Tools

  • dada2: Accurate sample inference from amplicon data with single nucleotide resolution

Metagenome

Tools

pipeline

Single cell transcriptome

  • BISCUIT_SingleCell_IMM_ICML_2016: R Codebase for BISCUIT: Infinite Mixture Model to cluster and impute single cells.
  • cisTopic: Probabilistic modelling of cis-regulatory topics from single cell epigenomics data
  • CONICS: COpy-Number analysis In single-Cell RNA-Sequencing
  • DoubletDetection: Doublet detection in single-cell RNA-seq data.
  • dropSeqPipe: A SingleCell RNASeq pre-processing pipeline built on snakemake
  • HoneyBADGER: HMM-integrated Bayesian approach for detecting CNV and LOH events from single-cell RNA-seq data
  • ImmuneResistance: Single-cell RNA-seq of melanoma ecosystems reveals sources of T cell exclusion linked to immunotherapy clinical outcomes
  • inferCNV: Inferring CNV from Single-Cell RNA-Seq
  • SAVER: Single-cell RNA-seq Gene Expression Recovery
  • scanpy: Single-Cell Analysis in Python. Scales to >1M cells. http://scanpy.rtfd.io
  • scde: R package for analyzing single-cell RNA-seq data
  • scImpute: Accurate and robust imputation of scRNA-seq data
  • scell: Single-CELL rna-seq analysis software
  • scg_lib_structs: Collections of library structure and sequence of popular single cell genomic methods
  • single_cell_portal_core: Rails/Docker application for the Broad Institute single cell RNA-seq data portal
  • single-cell-pseudotime: An overview of algorithms for estimating pseudotime in single-cell RNA-seq data
  • single-cell-tutorial
  • SingleR: Single-cell RNA-seq cell types Recognition
  • scRNA-tools: Table of software for the analysis of single-cell RNA-seq data.
  • seurat: R toolkit for single cell genomics
  • snATAC: Ren Lab in-house dual-barcode single nucleus ATAC-seq (snATAC-seq) analysis pipeline
  • STREAM_atac: Single-cell Trajectories Reconstruction, Exploration And Mapping of single-cell data. Preprocessing steps for single cell atac-seq data
  • STREAM: Single-cell Trajectories Reconstruction, Exploration And Mapping of single-cell data
  • tenx: Pipelines for the analysis of 10x single-cell RNA-sequencing data
  • awesome-single-cell
  • scPipe: a pipeline for single cell RNA-seq data analysis
  • Linnarsson Lab Single-cell analysis of mouse cortex
  • Human MTG single nucleus RNA-seq data
  • scMerge:Statistical technique for removing unwanted variation from multiple scRNA-seq datasets
  • scRNA-seq-workshop-Fall-2018
  • SoupX: R package to quantify and remove cell free mRNAs from droplet based scRNA-seq data

Nanopore

pipeline

  • NanoDJ: A Dockerized Jupyter Notebook for Interactive Oxford Nanopore MinION Sequence Manipulation and Genome Assembly

  • gnomad-sv-pipeline: Code and custom scripts relevant to gnomAD-SV (Collins*, Brand*, et al., 2019)

  • sv-benchmark: Public Benchmark of Long-Read Structural Variant Caller on PacBio CCS HG002 Data

  • Pomoxis: comprises a set of basic bioinformatic tools tailored to nanopore sequencing

  • Nanoflow: a NANOpore sequencing data bioinformatics workFLOW

  • Scrappie: a technology demonstrator for the Oxford Nanopore Research Algorithms group

  • wub: Tools and software library developed by the ONT Applications group

  • nanopore-scripts

  • nano-snakemake: A snakemake pipeline for SV analysis from nanopore genome sequencing

  • pipeline-pinfish-analysis: Pipeline for annotating genomes using long read transcriptomics data with pinfish

  • hpv_minION_analysis: Contains scripts used to analyze HPV samples sequenced on ONT minIONs.

  • Nanopype: https://nanopype.readthedocs.io/en/stable/

  • tiptoft: Predict plasmids from uncorrected long read data

  • nanoflow: De novo assembly of nanopore reads using nextflow

  • wub: Tools and software library developed by the ONT Applications group

  • monica: MinION Open Nucleotide Identifier for Continuous Analysis - an open source pathogen identifier for real-time analysis on MinION output

  • Step by step blasr installation example

  • pipeline-polya-ng: Pipeline for calling poly(A) tail lengths from nanopore direct RNA data using nanopolish

  • denbi-nanopore-training

  • pomoxis: Analysis components from Oxford Nanopore Research

quality control

  • scrappie: Scrappie is a technology demonstrator for the Oxford Nanopore Research Algorithms group
  • albacore: a professional quality suite of Rake tasks for building .NET or Mono based systems
  • Basecalling-comparison: A comparison of different Oxford Nanopore basecallers
  • fast5_fetcher: A tool for fetching nanopore fast5 files after filtering via demultiplexing, alignment, or other, to improve downstream processing efficiency
  • SquiggleKit: A toolkit for manipulating nanopore signal data
  • fast5seek: Subset of fast5 files contained in a fastq, BAM, or SAM file
  • albacore: Dockerfile for the Albacore basecaller from Oxford Nanopore
  • Basecalling-comparison: A comparison of different Oxford Nanopore basecallers
  • npBarcode: Demultiplex barcoded Oxford Nanopore sequencing
  • npReader: Real-time extraction and analysis Oxford Nanopore sequencing data
  • nanopore adapters
  • NanoFilt: https://github.com/wdecoster/nanofilt
  • Deepbinner: a signal-level demultiplexer for Oxford Nanopore reads
  • Porechop: adapter trimmer for Oxford Nanopore reads
  • poretools: a toolkit for working with Oxford nanopore data
  • NanoPlot: Plotting scripts for long read sequencing data
  • longread_plots: A collection of plots for long read sequencing FastQ files from devices like Oxford Nanopore's MinION and PromethION.
  • Nanopolish
  • nanoQC: Quality control tools for nanopore sequencing data
  • NanoR: R package for user-friendly analysis and comparison of ONT data
  • pomoxis: Analysis components from Oxford Nanopore Research
  • poretools document
  • poretools github: a toolkit for working with Oxford nanopore data
  • qcat: qcat is Python command-line tool for demultiplexing Oxford Nanopore reads from FASTQ files
  • pycoQC: pycoQC computes metrics and generates Interactive QC plots from the sequencing summary report generated by Oxford Nanopore technologies basecaller (Albacore/Guppy)
  • nanopack: Easily install all nanopack scripts together
  • nanocomp: Comparison of multiple long read datasets
  • nanolyse: Remove lambda phage reads from a fastq file
  • nanomath: A few simple math function for other Oxford Nanopore processing scripts

Assembly

  • NovoGraph: building whole genome graphs from long-read-based de novo assemblies
  • wtdbg2: A fuzzy Bruijn graph approach to long noisy reads assembly
  • smartdenovo: Ultra-fast de novo assembler using long noisy reads
  • MECAT2
  • quickmerge: A simple and fast metassembler and assembly gap filler designed for long molecule based assemblies.
  • npGraph: Resolve assembly graph in real-time using nanopore data
  • Canu
  • shasta: De novo assembly from Oxford Nanopore reads
  • RaGOO: A tool to order and orient genome assembly contigs via Minimap2 alignments to a reference genome
  • helen: H.E.L.E.N. (Homopolymer Encoded Long-read Error-corrector for Nanopore)

polish

  • racon: Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads
  • ntEdit: scalable genome assembly polishing
  • nanopolish: Signal-level algorithms for MinION data
  • Apollo
  • Quiver

Variants

  • Longshot: diploid SNV caller for error-prone reads
  • NanoSatellite: Dynamic time warping of Oxford Nanopore squiggle data to characterize tandem repeats
  • Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing

Methylation

  • deepsignal: Detecting methylation using signal-level features from Nanopore sequencing reads
  • tombo: a suite of tools primarily for the identification of modified nucleotides from raw nanopore sequencing data
  • DeepMod: a deep-learning tool for genomic-scale, strand-sensitive and single-nucleotide based detection of DNA modifications
  • nanopore-methylation
  • mCaller: A python program to call methylation (m6A in DNA) from nanopore signal data
  • EpiNano: Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads

Mapping tools

  • graphmap: A highly sensitive and accurate mapper for long, error-prone reads
  • rkmh: Classify sequencing reads using MinHash
  • minialign: fast and accurate alignment tool for PacBio and Nanopore long reads

simulator

  • NanoSim: Nanopore sequence read simulator
  • DeepSimulator: The first deep learning based Nanopore simulator which can simulate the process of Nanopore sequencing.

data

Transcriptome

  • dRNA-paper-scripts: Highly parallel direct RNA sequencing on an array of nanopores(https://www.nature.com/articles/nmeth.4577)
  • pinfish:Tools to annotate genomes using long read transcriptomics data
  • flair: Full-Length Alternative Isoform analysis of RNA
  • pinfish: Tools to annotate genomes using long read transcriptomics data
  • pipeline-pinfish-analysis: Pipeline for annotating genomes using long read transcriptomics data with pinfish
  • pychopper: A tool to identify full length cDNA reads
  • LoReAn: Long Reads Annotation pipeline
  • poreplex: A versatile sequenced read processor for nanopore direct RNA sequencing
  • Mandalorion: Analysis Pipeline to analyze Nanopore RNAseq data
  • pipeline-polya-ng: Pipeline for calling poly(A) tail lengths from nanopore direct RNA data using nanopolish

Pacbio data

  • PacBioEDA: Python scripts for Exploratory Data Analysis of Pacific Biosciences sequence data
  • GenomicConsensus: PacBio® variant and consensus caller
  • pbalign: pbalign maps PacBio reads to reference sequences and saves alignments to a BAM file
  • pbmm2: A minimap2 frontend for PacBio native data formats

Assembly genome

Assembly with short reads

  • w2rap-contigger: An Illumina PE genome contig assembler, can handle large (17Gbp) complex (hexaploid) genomes.
  • w2rap: WGS (Wheat) Robust Assembly Pipeline
  • GFA-spec: Graphical Fragment Assembly (GFA) Format Specification
  • HapCUT2: software tools for haplotype assembly from sequence data
  • masurca: MaSuRCA Genome Assembler Quick Start Guide
  • minia: Minia is a short-read assembler based on a de Bruijn graph
  • npScarf: Scaffold and Complete assemblies in real-time fashion
  • redundans: Redundans is a pipeline that assists an assembly of heterozygous/polymorphic genomes.
  • Scaff10X: Pipeline for scaffolding and breaking a genome assembly using 10x genomics linked-reads
  • SDA: Segmental Duplication Assembler (SDA)
  • shovill: Faster SPAdes assembly of Illumina reads
  • SOAPdenovo2

Assembly with long reads

  • FALCON-Phase: FALCON-Phase integrates PacBio long-read assemblies with Phase Genomics Hi-C data to create phased, diploid, chromosome-scale scaffolds
  • wtdbg2: A fuzzy Bruijn graph approach to long noisy reads assembly
  • DBG2OLC: The genome assembler that reduces the computational time of human genome assembly from 400,000 CPU hours to 2,000 CPU hours, utilizing long erroneous 3GS sequencing reads and short accurate NGS sequencing reads.
  • Flye: Fast and accurate de novo assembler for single molecule sequencing reads
  • PBcR (http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR)
  • SALSA: A tool to scaffold long read assemblies with Hi-C data
  • smartdenovo: Ultra-fast de novo assembler using long noisy reads
  • NovoGraph: Genome Graph of Long-read De Novo Assemblies

fill gap && polish

  • quickmerge: A simple and fast metassembler and assembly gap filler designed for long molecule based assemblies.
  • PBJelly: Gap-closing-with-PBJelly
  • GapCloser

Assembly transcriptome

  • Corset: Software for clustering de novo assembled transcripts and counting overlapping reads

Variants

Somatic variants

  • ascatNgs: Somatic copy number analysis using paired end wholegenome sequencing
  • needlestack: Multi-sample somatic variant caller
  • seurat: Tumor-Normal Variant Caller
  • facets: Algorithm to implement Fraction and Copy number Estimate from Tumor/normal Sequencing.
  • Shimmer: a software package for the characterization of genetic differences between two very similar samples, e.g., a tumor sample and its matched normal tissue sample
  • neusomatic: Deep convolutional neural networks for accurate somatic mutation detection
  • Pisces: Somatic and germline variant caller for amplicon data.
  • deTiN: DeTiN is designed to measure tumor-in-normal contamination and improve somatic variant detection sensitivity when using a contaminated matched control.
  • DeepSVR: a machine learning model approach to somatic variant refinement
  • somaticseq: An ensemble approach to accurately detect somatic mutations using SomaticSeq
  • MuSiC2: identifying mutational significance in cancer genomes

Germline variants

  • benchmarking germline small-variant calls: Repository for the GA4GH Benchmarking Team work developing standardized benchmarking methods for germline small variant calls
  • vt: A tool set for short variant discovery in genetic sequence data
  • dna-seq-gatk-variant-calling: This Snakemake pipeline implements the GATK best-practices workflow
  • deepvariant: an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
  • speedseq: A flexible framework for rapid genome analysis and interpretation
  • vg: tools for working with genome variation graphs
  • GEMINI: integrative exploration of genetic variation and genome annotations

Haplotype & phase

Imputation

LD

  • PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format(VCF) files
  • emeraLD: tools to efficiently retrieve and calculate LD
  • ngsLD: Calculation of pairwise Linkage Disequilibrium (LD) under a probabilistic framework
  • LD Hub
  • LDSC and associated files
  • abstar: VDJ assignment and antibody sequence annotation. Scalable from a single sequence to billions of sequences.

Structural variants

SV Caller for third generation sequences

  • svim: Structural Variant Identification Method using Long Reads
  • SURVIVOR: Toolset for SV simulation, comparison and filtering
  • Sniffles: Structural variation caller using third generation sequencing
  • NanoSV: SV caller for nanopore data
  • smrtsv2: long read structural variant caller
  • pbsv: PacBio structural variant (SV) calling and analysis tools
  • Picky: Structural Variants Pipeline for Long Reads
  • NanoVar: Structural variant caller using low-depth Nanopore sequencing
  • SV-plaudit: Pipeline for structural variant image curation and analysis.
  • SVJedi: SV genotyping with long reads
  • cuteSV: Long read based human genomic structural variation detection
  • EnsembleSV: A workflow for SV inference allosing for multiple sequencing technologies and methods

SV with illumina data

  • svtyper: Bayesian genotyper for structural variants
  • lumpy-sv: a general probabilistic framework for structural variant discovery
  • parliament2: Runs a combination of tools to generate structural variant calls on whole-genome sequencing data
  • delly: Structural variant discovery by integrated paired-end and split-read analysis
  • manta: Structural variant and indel caller for mapped sequencing data
  • parliament2: Runs a combination of tools to generate structural variant calls on whole-genome sequencing data
  • SV2: Support Vector Structural Variation Genotyper
  • pindel: identify the breakpoints of these variants from paired-end short reads
  • MetaSV: An accurate and integrative structural-variant caller for next generation sequencing
  • svaba: Structural variation and indel detection by local assembly
  • wham: Structural variant detection and association testing
  • gridss: Genomic Rearrangement IDentification Software Suite
  • breakdancer: SV detection from paired end reads mapping
  • SVenX: Pipeline for SV detection using 10X genomics data
  • paragraph: Graph realignment tools for structural variants
  • svtools: Tools for processing and analyzing structural variants
  • SnowmanSV: Structural variation and indel detection using rolling local string graph assembly
  • truvari: Structural variant comparison tool for VCFs
  • parliament2: Runs a combination of tools to generate structural variant calls on whole-genome sequencing data

CNV

  • Control-FREE: a tool for assessing copy number and allelic content using next generation sequencing data
  • canvas: Canvas Copy Number Variant Caller
  • CNVnator: a tool for CNV discovery and genotyping from depth-of-coverage by mapped reads
  • cnv_facets: Somatic copy variant caller (CNV) for next generation sequencing
  • CNV-Visualizer: Visualizing Copy Number Variations
  • facets: Algorithm to implement Fraction and Copy number Estimate from Tumor/normal Sequencing.
  • cnvkit: Copy number variant detection from targeted DNA sequencing
  • ADTEx: detect somatic copy number variations (CNVs)
  • NGSEPcore: an integrated framework for analysis of high throughput sequencing (HTS) reads. The main functionality of NGSEP is the variants detector, which allows to make integrated discovery and genotyping of Single Nucleotide Variants (SNVs), insertions, deletions, and genomic regions with copy number variation (CNVs)
  • aCNViewer: Comprehensive genome-wide visualization of absolute copy number and copy neutral variations
  • cancerTitanCNA: Analysis of subclonal copy number alterations (CNA) and loss of heterozygosity (LOH)

CNV workflow

General

  • svaba: Structural variation and indel detection by local assembly
  • truvari: Structural variant comparison tool for VCFs
  • smoove: structural variant calling and genotyping with existing tools, but, smoothly
  • sv-pipeline: Pipeline for structural variation detection in cohorts
  • svtools: Tools for processing and analyzing structural variants
  • samplot: Plot structural variant signals from many BAMs and CRAMs
  • svviz2: visual evaluation of read support for structural variation
  • FusorSV: an algorithm for optimally combining data from multiple structural variation detection methods
  • seeksv: A bioinformatics tool for SV detection and virus integration discovery
  • SViper: Swipe your Structural Variants called on long (ONT/PacBio) reads with short exact (Illumina) reads.

SV annotation

  • AnnotSV: Annotation and Ranking of Human Structural Variations
  • Nirvana: Nirvana provides clinical-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, and SVs (including CNVs)
  • StructuralVariantAnnotation: R package designed to simplify structural variant analysis

SV (CNV) annotation database

GWAS

QTL

  • QTLseqr: QTLseqr is an R package for QTL mapping using NGS Bulk Segregant Analysis

ATAC

ChIP

Methylation

  • Methylation QTL data for brain and blood
  • methylpy: WGBS/NOMe-seq Data Processing & Differential Methylation Analysis
  • ViewBS: a powerful toolkit for visualization of high-throughput bisulfite sequencing data
  • mCaller: A python program to call methylation (m6A in DNA) from nanopore signal data
  • DNA-methylation-analysis: notes on DNA methylation analysis (arrays and sequencing data)
  • bs3: BS-Seeker3: An Ultra-fast, Versatile Pipeline for Mapping Bisulfite-treated Reads
  • bsseq: Devel repository for bsseq

Hi-C

  • Hi-C data
  • tadtool: an interactive tool for the identification of meaningful parameters in TAD-calling algorithms for Hi-C data.
  • juicebox_scripts: A collection of scripts for working with Hi-C data, Juicebox, and other genomic file formats
  • ALLHiC: phasing and scaffolding polyploid genomes based on Hi-C data
  • genomedisco: Software for comparing contact maps from HiC, CaptureC and other 3D genome data
  • 3DChromatin_ReplicateQC: Software to compute reproducibility and quality scores for Hi-C data
  • hic_breakfinder

10X data

ngs

  • ngsDist:Estimation of pairwise distances under a probabilistic framework
  • NGS-pipe: next-generation sequencing pipelines for precision oncology
  • ngsPopGen: Population genetics analyses from NGS data
  • ngsTools: Programs to analyse NGS data for population genetics purposes
  • viral-ngs: Viral genomics analysis pipelines
  • NGSCheckMate: Software program for checking sample matching for NGS data
  • abtools: Analysis of antibody NGS data
  • alignment-and-variant-calling-tutorial: basic walk-throughs for alignment and variant calling from NGS sequencing data

Plotter

UMI

  • zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
  • umis: Tools for processing UMI RNA-tag data

TCR

  • tcR: Advanced Data Analysis of Immune Receptor Repertoires

Bioinformatics tutorial

database & Websites

Deal with vcf

  • cyvcf2: fast VCF and BCF processing
  • CyVCF document
  • CyVCF: A fast Python library for VCF files leveraging Cython for speed.
  • rtg-tools: Utilities for accurate VCF comparison and manipulation
  • spVCF: Sparse Project VCF: evolution of VCF to encode population genotype matrices efficiently
  • vcf2phylip: Convert SNPs in VCF format to PHYLIP, NEXUS, binary NEXUS, or FASTA alignments for phylogenetic analysis
  • vcflib: a simple C++ library for parsing and manipulating VCF files, + many command-line utilities
  • GTShark: Genotype compression in large projects

Websites for cancer data

Blogs

Labs

Tool resources

Python

Machine Learning

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published