This repository, ideally, contains the necessary code and data to re-create the results of Hockenberry et al. 2017 ( see: http://rsob.royalsocietypublishing.org/content/7/1/160239.long ). At the moment of last commit this process is about 90% done but no promises on the second half of Shine_Dalgarno_analysis notebook. The entire code in this repository is primarily in the form of Jupyter Notebooks with a few supporting libraries that are called
The pipeline starts by creating translation efficiency measurements from ribosome profiling and RNA-seq .wiggle data (provided in Data/) (Please note that we did not perform any sequence mapping on this data and instead relied on the original authors mappings and .wig files as our intention was to compare as closely as possible to their results. Ideally, this mapping should probably be re-run especially in the event of course that the user is hoping to test their own data and given newer mapping / analysis protocols that are better able to handle the specifics of ribosomal profiling data). Users should first run through the `calculate_trans_eff.ipynb' notebook if they want to see how we processed and worked with ribosomal profiling and mRNA seq data. The major intention of this notebook however is to create .json files of translation efficiency measurements and save them to ../Data/. We currently provide these files to that the user does not have to re-run this notebook at all should they choose not to.
Next, the primary findings were to use that translation efficiency data in order to investigate the effect of anti-Shine-Dalgarno sequence binding. A variety of files are included in Data/ that quantify aspects of cis-mRNA folding and trans-anti-Shine-Dalgarno sequence binding strengths. Users should now feel free to run through the `Shine_Dalgarno_analysis.ipynb' notebook and at the time of this writing most code should run without problems particularly our scanning of different aSD sequences and spacings.
All of the code contained in this repo should run with a basic Anaconda running Python 3.xx or similar install that includes basic scientific libraries. Users can check the top of .py libraries or .ipynb jupyter notebooks for individual requirements but basic scipy,numpy,pandas,etc. shoud do the trick. The exception being that Biopython should be installed which I leave as an exercise to the user. Once I know that everything runs directly from this repository I'll try to include a dump of my exact python distribution.
Specific questions/comments/concerns should be addressed to Adam Hockenberry via adam [dot] hockenberry [at] northwestern [dot] edu. I'll try and get back to you in a timely manner.