GitHub

Welcome to the LOng-read General github repo!

This is a GitHub repo of (mostly independent) Python/R scripts that I developed to analyse data from long-read sequencing experiments. Purpose of scripts vary from generating txt files to run community tools (example pipelines), generating plots post-SQANTI, running differential expression analyses to more custom applications.

Processing ONT raw data

A pipeline for processing raw ONT reads from transcriptome cDNA processing, using research community tools (i.e. Porechop,Minimap2,SQANTI3) and own custom scripts.

Data exploration post-SQANTI

Below listed are features that can be explored on <sample>_classification.txt generated from SQANTI.

number of isoforms by structural category
correlate exon number, gene length with isoform number
identify long-non-coding RNA isoforms
plot and test the number of isoforms with/without certain features (i.e. within/without 50bp of CAGE peak/TSS/TTS)

To run functions, read in <sample>_classification.txt file using:

SQANTI_class_preparation(<sample>_classification.txt, standard) if expression columns are included in the file (after running --FL_count in SQANTI)
SQANTI_class_preparation(<sample>_classification.txt, nstandard) if expression is not included

Characterize merged datasets

subset_targetgenes_classfiles.py: Subset SQANTI classification file based on genes and reads
colour_transcripts_by_countandpotential.py: Colour bed file by abundance and coding potential
extract_fasta_bestorf.py: Create a fasta file based on best ORF defined from CPAT

Differential expression analysis

Current script dump to maintain. Scripts to input results after running tappAS, running linear regression etc...

Miscellaneous

replace_filenames_with_csv.py: Replace multiple file names in a directory using reference csv file
search_fasta_by_sequence.py: Subset fasta based on sequence
subset_fasta_gtf.py: Subset gtf, fasta and bed files based on list of transcript IDs

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
0_conda		0_conda
0_utils		0_utils
aesthetics_basics_plots		aesthetics_basics_plots
alternative_splicing		alternative_splicing
assist_isoseq_processing		assist_isoseq_processing
assist_ont_processing		assist_ont_processing
compare_datasets		compare_datasets
differential_analysis		differential_analysis
expression_methylation_integration/MethReg		expression_methylation_integration/MethReg
longread_QC		longread_QC
merge_characterise_dataset		merge_characterise_dataset
miscellaneous		miscellaneous
phasing		phasing
proteomics		proteomics
run_tappAS		run_tappAS
target_gene_annotation		target_gene_annotation
transcriptome_stats		transcriptome_stats
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Processing ONT raw data

Data exploration post-SQANTI

Characterize merged datasets

Differential expression analysis

Miscellaneous

About

Releases

Packages

Languages

SziKayLeung/LOGen

Folders and files

Latest commit

History

Repository files navigation

Processing ONT raw data

Data exploration post-SQANTI

Characterize merged datasets

Differential expression analysis

Miscellaneous

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages