Skip to content

elvirakinzina/GSH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 

Repository files navigation

GSH - Genomic Safe Harbors

Index

Description

Pipeline for identification for novel human Genomic Safe Harbor (GSH) sites. The following criteria were used to computationally predict novel GSH:

  • 50kb away from known genes
  • 300kb away from known oncogenes
  • 300kb away from microRNAs, centromeres, telomeres, genomic gaps
  • 150kb away from lncRNAs, tRNAs
  • 20kb away from enhancers

Features

Produced genomic coordinates serve as computationally predicted GSH sites based on previously established as well as newly introduced criteria. All criteria used are universal for all cell types.

Prerequisites

Data

Usage

Usage:

  ./predict_gsh.sh [-genes] [-oncogenes] [-micrornas] [-trnas] [-lncrnas] [-enhancers] [-centromeres] [-gaps] [-dist_from_genes] [-dist_from_oncogenes] [-dist_from_micrornas] [-dist_from_trnas] [-dist_from_lncrnas] [-dist_from_enhancers] [-dist_from_centromeres] [-dist_from_gaps] [-h|--help]	

Options:

	-genes: Whether to exclude regions with and around genes (default=true)
	-oncogenes: Whether to exclude regions with and around oncogenes (default=true)
	-micrornas: Whether to exclude regions with and around microRNAs (default=true)
	-trnas: Whether to exclude regions with and around tRNAs (default=true)
	-lncrnas: Whether to exclude regions with and around lncRNAs (default=true)
	-centromeres: Whether to exclude regions with and around centromeres (default=true)
	-gaps: Whether to exclude regions with and around gaps (default=true)
	-enhancers: Whether to exclude enhancer regions (default=true)

	-dist_from_genes: Minimal distance from any safe harbor to any gene in bp (default=50000)
	-dist_from_oncogenes: Minimal distance from any safe harbor to any oncogene in bp (default=300000)
	-dist_from_micrornas: Minimal distance from any safe harbor to any microRNA in bp (default=300000)
	-dist_from_trnas: Minimal distance from any safe harbor to any tRNA in bp (default=150000)
	-dist_from_lncrnas: Minimal distance from any safe harbor to any long-non-coding RNA in bp (default=150000)
	-dist_from_enhancers: Minimal distance from any safe harbor to any enhancer in bp (default=20000)
	-dist_from_centromeres: Minimal distance from any safe harbor to any centromere in bp (default=300000)
	-dist_from_gaps: Minimal distance from any safe harbor to any gaps in bp (default=300000)
	-h, --help: Prints help

Running with the default parameters:

./predict_gsh.sh 

Output:

  Getting gene annotation from GENCODE
  Distance from genes = 50000 bp
  Distance from oncogenes = 300000 bp
  Distance from microRNAs = 300000 bp
  Distance from tRNAs = 150000 bp
  Distance from lncRNAs = 150000 bp
  Distance from enhancers = 20000 bp
  Distance from centromeres = 300000 bp
  Distance from gaps = 300000 bp
  Merging all genomic regions to avoid
  Obtaining genomic coordinates and sequences of safe harbors

Running with modified parameters:

./predict_gsh.sh -trnas false -dist_from_lncrnas 50000 -dist_from_enhancers 0 -dist_from_centromeres 0 -dist_from_gaps 0

Output:

  Getting gene annotation from GENCODE
  Distance from genes = 50000 bp
  Distance from oncogenes = 300000 bp
  Distance from microRNAs = 300000 bp
  Distance from lncRNAs = 50000 bp
  Distance from enhancers = 0 bp
  Distance from centromeres = 0 bp
  Distance from gaps = 0 bp
  Merging all genomic regions to avoid
  Obtaining genomic coordinates and sequences of safe harbors

The output is two files: Safe_harbors.bed that has genomic coordinates of all regions potentially containing safe harbors and Safe_harbors.fasta contains sequences of these regions.

Reference

Aznauryan et al. (2022), Discovery and validation of novel human genomic safe harbor sites for gene and cell therapies. Cell Genomics

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages