Home

Welcome to our Design Document!

This project was completed by Alec Loftus, Mohammed Alsawi, Erin Maley, and Rohan Sethi as a part of Dr. Wheeler's COMP383/483 class.

Overview:

Context:

We know the shapes of proteins are vital to enabling their function. We also know that the pH environment around the protein also plays an impact in its ability to fold- the polarities of each amino acid in the protein contribute to folding and are impacted by the pH. With all this, some 30% of proteins that are coded for never fold at all. The ability and willingness to fold by nature and in different pHs can generally be predicted by sequence alone. This script seeks to test different machine learning models and assess their ability to predict protein folding based upon these conditions.

Goals and Non-Goals:

Primary Goals:

Create scripts for training and testing a couple of ML models on synthesized protein datasets
Have documented accuracy reports for various models
Consider other relevant information that may be nice to have in test data (more of a scientific question/goal than programming)

Non-Goals:

Deep learning models testing the datasets (would be nice to include for comparison but not essential for success)
We will not be determining more specific information about the protein folding other than whether it is 'folded' according to the set threshold (i.e. we won't label degree or category of folding) and the ML model's accuracy with that binary.

Proposed Solution:

The structure of this project is less developing a full pipeline to run a number of scripts/tools and produce usable data. Instead, this project serves as a means of testing machine learning models and gauging their aptitude for predicting protein folding propensity in different pHs. Data will be initialized and prepared for each model as needed and as specified for testing. Subsets of data will be created for training that will be different and/or subset from the data that we use to test each model.

Scikit-learn will be the primary ML tool used to run the data as well as to assess accuracy of prediction. F1 scores will be generated from the precision assessing functions of the tool. Repetition of testing models with various parts of the data will help to understand the weight of each feature in predicting protein folding. Keeping a record of all of the predictions made by each tool and their F1 scores with each subset of data with any feature alterations will allow results to be verifiable and to help determine which tool(s) work the best at predicting protein folding under the circumstances outlined for us by the lab providing the project.

Milestones:

Week	Alec Loftus	Mohammed Alsawi	Erin Maley	Rohan Sethi	Deadlines/Events
_{Week 1 3/13-3/17}	_{Read literature Work on Design Doc}	_{Begin slide work}	_{Prepare questions for PKH}	_{Read through code files}	_{First group meeting!}
_{Week 2 3/20-3/24}	_{Weekly Milestones table/slides Make Design Document}	_{Implementation Plan slides Assist with Proposed Solution}	_{Introduction Slides Prepare questions for PKH}	_{Implementation Plan slides Assist with Goals & Non-Goals}	_{Meeting w/ PKH Tuesday Initial Presentation}
_{Week 3 3/27-3/31}	_{Keep Wiki/Milestones updated Work on script to initiate model Make sure everything's pushed by EOD Friday for Repo check}	_{Work on script to assess accuracy}	_{Comment through .jpynb code Prepare presentation Work on script to train model}	_{Work on script to train model}	_{5-min Presentation Repo Check #1}
_{Week 4 4/3-4/7}	_{Test Mohammed's accuracy code}	_{Test Rohan's training code}	_{Test Alec's initializing code}	_{Continue work on training code Update README to include information on the ML models used}	_{EASTER BREAK}
_{Week 5 4/10-4/14}	_{Prepare presentation All pushes by EOD Friday for Repo Check #2}	_{Check all code thus far has been commented and pushed}	_{Update wiki/weekly milestones}	_{Train using deep learning models}	_{5-min Presentation Repo Check #2}
_{Week 6 4/17-4/21}	_{Update README and comment code Pulls for Repo Check #3}	_{Test all code for training}	_{Continue work on deep learning models code Begin work on Final Presentation}	_{Comment code Check in with PKH}	_{Cross-team hacking Repo Check #3}
_{Week 7 4/24-4/28}	_{Test Erin's deep learning code final time Work on Presentation}	_{Work on Final Presentation}	_{Check all code Overview presentation work}	_{Work on Final Presentation}	_{Final Presentation}
_{Week 8 5/1-5/5}	_{Ensure all pulls go through Update the Wiki final time}	_{Work on App Note Update README}	_{Work on App Note}	_{Make sure all code has been commented}	_{Final App Note Final Project Code turned in}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly