GitHub - abelew/prfdb: Some scripts to search genomic data for significant secondary structures

abelew / prfdb Public

Notifications You must be signed in to change notification settings
Fork 1
Star 4

Some scripts to search genomic data for significant secondary structures

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 963 Commits
apache		apache
backup		backup
bin		bin
blast		blast
contrib		contrib
data		data
descr		descr
doc		doc
download		download
error		error
folds		folds
fonts		fonts
help		help
html		html
images		images
ingolia		ingolia
js		js
lib		lib
outputs		outputs
pars		pars
src		src
t		t
.gitignore		.gitignore
.htaccess		.htaccess
.project		.project
AUTHORS		AUTHORS
INSTALL		INSTALL
Makefile.PL		Makefile.PL
README		README
TODO		TODO
autohandler		autohandler
cfeynman.html		cfeynman.html
cloud_mfe_z.html		cloud_mfe_z.html
crontab		crontab
crontab.sh		crontab.sh
detail.html		detail.html
dhandler		dhandler
distribution.html		distribution.html
download.html		download.html
favicon.ico		favicon.ico
feynman_overlap.html		feynman_overlap.html
filter.html		filter.html
fixdeps.pl		fixdeps.pl
fold.html		fold.html
gene_summary.html		gene_summary.html
generate_bar_graphs.html		generate_bar_graphs.html
generate_boot.html		generate_boot.html
generate_distribution.html		generate_distribution.html
go_test.html		go_test.html
graphs.html		graphs.html
handler.pl		handler.pl
import.html		import.html
index.html		index.html
index_stats.html		index_stats.html
jviz.html		jviz.html
link_out.html		link_out.html
list_slipsites.html		list_slipsites.html
make_distribution.html		make_distribution.html
micro.html		micro.html
overlay.html		overlay.html
overlay_half_lives_1.html		overlay_half_lives_1.html
pie_charts.html		pie_charts.html
prf_daemon		prf_daemon
prf_daemon.pl		prf_daemon.pl
prfdb.conf.complex		prfdb.conf.complex
prfdb.conf.default		prfdb.conf.default
prfdb_httpd.conf		prfdb_httpd.conf
print_gene.html		print_gene.html
print_landscape.html		print_landscape.html
queue.sh		queue.sh
robots.txt		robots.txt
search.html		search.html
search_blast.html		search_blast.html
setup.sh		setup.sh
single_detail.html		single_detail.html
snp.html		snp.html
species_selector.html		species_selector.html
style.css		style.css
summary.html		summary.html
upload_flot.html		upload_flot.html
viral.html		viral.html

Repository files navigation

This is a README file, there are many like it, but this is my own.

Start with an explanation of the database and its schemas.
The primary key for everything lies in the genome table.

genome:
Id Acccession Species Genename Version Comment Mrna_seq Protein_seq Orf_start Orf_stop Lastupdate
Id is auto-incremented. Genename must be parsed out of the input database and so is not as
reliable as comment. mrna_seq is the full sequence given. protein_seq comes from the cds
annotation. orf_start and orf_stop define what range we will look at later.
Problems:
1) This table is filled with 1 full copy of the genomic sequence (mrna_seq) for every
orf in the sequence, this is annoying for viral genomes.

gene_info:
Same as GENOME except text information about each gene, indexed for fast searching.

queue:
Id Genome_id Public Params Out Done
Each id is unique, but the genome_id will be used to pull the sequence
SELECT id, genome_id, params where public='1' order by rand() limit 1
SELECT id, genome_id, params where done='0' order by rand() limit 1
should pull the parameters and genome for a random one
UPDATE queue set out='1' where id='xxx'
UPDATE queue set done='1' where id='xxx'
should tell the database that a given sequence is finished

mfe_$species:
Id Genome_id Species Accession Start Slipsite Seqlength Sequence Output Parsed Mfe Pairs Knotp Barcode Lastupdate
Same ideas apply. The only time the Accession should be duplicated is when you have
a) Multiple slippery sites per transcript
b) Multiple ORFs per transcript

boot_$species:
ID Genome_id Species Accession Start Iterations Rand_method Mfe_method Mfe_mean Mfe_Sd Mfe_Se Pairs_mean Pairs_sd Pairs_se Mfe_values Lastupdate

landscape_$species:
The largest tables in the database, contain the mfe data for entire sequences.

overlap:
Looks for + and - 1 frameshift products

agree:
Generates statistics about how well nupack,pknots, and hotknots agree for each sequence

index_stats:
Generates statistics used for the web site.