Skip to content

πŸ”¬ Bioinformatics Notebook. Scripts for bioinformatics pipelines, with quick start guides for programs and video demonstrations.

License

Notifications You must be signed in to change notification settings

BenAawf/bioinfo-notebook

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

layout title nav_order description permalink
default
Home
1
Quick start guides for bioinformatics programs, with video demonstrations and scripts.
/

by Ronan Harrington

Build Status License: MIT GitHub issues GitHub repo size Website DOI

This project provides introductions to various bioinformatics tools with short guides, video demonstrations, and scripts that tie these tools together. The documents in this project can be read locally in a plain-text editor, or viewed online at https://rnnh.github.io/bioinfo-notebook/. If you are not familiar with using programs from the command line, begin with the page "Introduction to the command line". If you have any questions, or spot any mistakes, please submit an issue on GitHub.

Pipeline examples

These bioinformatics pipelines can be carried out using scripts and tools described in this project. Input files for some of these scripts can be specified in the command line; other scripts will need to be altered to fit the given input data.

SNP analysis

  • FASTQ reads from whole genome sequencing (WGS) can be assembled using SPAdes.
  • Sequencing reads can be aligned to this assembled genome using bowtie2.
  • The script snp_calling.sh aligns sequencing reads to an assembled genome and detects single nucleotide polymorphisms (SNPs). This will produce a Variant Call Format (VCF) file.
  • The proteins in the assembled reference genome- the genome to which the reads are aligned- can be annotated using genome_annotation_SwissProt_CDS.sh.
  • The genome annotation GFF file can be cross-referenced with the VCF file using annotating_snps.R. This will produce an annotated SNP format file.
  • Annotated SNP format files can be cross-referenced using annotated_snps_filter.R. For two annotated SNP files, this script will produce a file with annotated SNPs unique to the first file, and a file with annotated SNPs unique to the second file.

RNA-seq analysis

Detecting orthologs between genomes

Contents

Installation instructions

After following these instructions, there will be a copy of the bioinfo-notebook GitHub repo on your system in the ~/bioinfo-notebook/ directory. This means there will be a copy of all the documents and scripts in this project on your computer. If you are using Linux and run the Linux setup script, the bioinfo-notebook virtual environment- which includes the majority of the command line programs covered in this project- will also be installed using conda.

1. This project is written to be used through a UNIX (Linux or Mac with macOS Mojave or later) operating system. If you are using a Windows operating system, begin with these pages on setting up Ubuntu (a Linux operating system):

Once you have an Ubuntu system set up, run the following command to update the lists of available software:

$ sudo apt-get update # Updates lists of software that can be installed

2. Run the following command in your home directory (~) to download this project:

$ git clone https://github.com/rnnh/bioinfo-notebook.git

3. If you are using Linux, run the Linux setup script with this command after downloading the project:

$ bash ~/bioinfo-notebook/scripts/linux_setup.sh

Video demonstration of installation

asciicast

Repository structure

bioinfo-notebook/
β”œβ”€β”€ assets/
β”‚Β Β  └── bioinfo-notebook_logo.svg
β”œβ”€β”€ data/
β”‚Β Β  β”œβ”€β”€ blastx_SwissProt_example_nucleotide_sequence.fasta.tsv
β”‚Β Β  β”œβ”€β”€ blastx_SwissProt_S_cere.tsv
β”‚Β Β  β”œβ”€β”€ design_table.csv
β”‚Β Β  β”œβ”€β”€ example_genome_annotation.gtf
β”‚Β Β  β”œβ”€β”€ example_nucleotide_sequence.fasta
β”‚Β Β  └── featCounts_S_cere_20200331.csv
β”œβ”€β”€ docs/
β”‚Β Β  β”œβ”€β”€ annotated_snps_filter.md
β”‚Β Β  β”œβ”€β”€ annotating_snps.md
β”‚Β Β  β”œβ”€β”€ augustus.md
β”‚Β Β  β”œβ”€β”€ blast.md
β”‚Β Β  β”œβ”€β”€ bowtie2.md
β”‚Β Β  β”œβ”€β”€ bowtie.md
β”‚Β Β  β”œβ”€β”€ cl_intro.md
β”‚Β Β  β”œβ”€β”€ cl_solutions.md
β”‚Β Β  β”œβ”€β”€ combining_featCount_tables.md
β”‚Β Β  β”œβ”€β”€ conda.md
β”‚Β Β  β”œβ”€β”€ DE_analysis_edgeR_script.md
β”‚Β Β  β”œβ”€β”€ DE_analysis_edgeR_script.pdf
β”‚Β Β  β”œβ”€β”€ fasterq-dump.md
β”‚Β Β  β”œβ”€β”€ fastq-dump.md
β”‚Β Β  β”œβ”€β”€ fastq-dump_to_featureCounts.md
β”‚Β Β  β”œβ”€β”€ featureCounts.md
β”‚Β Β  β”œβ”€β”€ file_formats.md
β”‚Β Β  β”œβ”€β”€ genome_annotation_SwissProt_CDS.md
β”‚Β Β  β”œβ”€β”€ htseq-count.md
β”‚Β Β  β”œβ”€β”€ linux_setup.md
β”‚Β Β  β”œβ”€β”€ orthofinder.md
β”‚Β Β  β”œβ”€β”€ part1.md    # Navigation page for website
β”‚Β Β  β”œβ”€β”€ part2.md    # Navigation page for website
β”‚Β Β  β”œβ”€β”€ part3.md    # Navigation page for website
β”‚Β Β  β”œβ”€β”€ report_an_issue.md
β”‚Β Β  β”œβ”€β”€ samtools.md
β”‚Β Β  β”œβ”€β”€ sgRNAcas9.md
β”‚Β Β  β”œβ”€β”€ snp_calling.md
β”‚Β Β  β”œβ”€β”€ SPAdes.md
β”‚Β Β  β”œβ”€β”€ ubuntu_virtualbox.md
β”‚Β Β  β”œβ”€β”€ UniProt_downloader.md
β”‚Β Β  └── wsl.md
β”œβ”€β”€ envs/            # conda environment files
β”‚Β Β  β”œβ”€β”€ augustus.yml            # environment for Augustus
β”‚Β Β  β”œβ”€β”€ bioinfo-notebook.txt
β”‚Β Β  β”œβ”€β”€ bioinfo-notebook.yml
β”‚Β Β  β”œβ”€β”€ orthofinder.yml         # environment for OrthoFinder
β”‚Β Β  └── sgRNAcas9.yml           # environment for sgRNAcas9
β”œβ”€β”€ scripts/
β”‚Β Β  β”œβ”€β”€ annotated_snps_filter.R
β”‚Β Β  β”œβ”€β”€ annotating_snps.R
β”‚Β Β  β”œβ”€β”€ combining_featCount_tables.py
β”‚Β Β  β”œβ”€β”€ DE_analysis_edgeR_script.R
β”‚Β Β  β”œβ”€β”€ fastq-dump_to_featureCounts.sh
β”‚Β Β  β”œβ”€β”€ genome_annotation_SwissProt_CDS.sh
β”‚Β Β  β”œβ”€β”€ linux_setup.sh
β”‚Β Β  β”œβ”€β”€ snp_calling.sh
β”‚Β Β  └── UniProt_downloader.sh
β”œβ”€β”€ _config.yml     # Configures github.io project website
β”œβ”€β”€ .gitignore
β”œβ”€β”€ LICENSE
β”œβ”€β”€ README.md
└── .travis.yml     # Configures Travis CI testing for GitHub repo

About

πŸ”¬ Bioinformatics Notebook. Scripts for bioinformatics pipelines, with quick start guides for programs and video demonstrations.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 60.3%
  • R 33.1%
  • Python 6.6%