Skip to content
Alec Loftus edited this page Mar 24, 2023 · 74 revisions

Welcome to our Design Document!

This project was completed by Alec Loftus, Mohammed Alsawi, Erin Maley, and Rohan Sethi as a part of Dr. Wheeler's COMP383/483 class.

Overview:

Context:

We know the shapes of proteins are vital to enabling their function. We also know that the pH environment around the protein also plays an impact in its ability to fold- the polarities of each amino acid in the protein contribute to folding and are impacted by the pH. With all this, some 30% of proteins that are coded for never fold at all. The ability and willingness to fold by nature and in different pHs can generally be predicted by sequence alone. This script seeks to test different machine learning models and assess their ability to predict protein folding based upon these conditions.

Goals and Non-Goals:

  • Primary Goals:
    • Create scripts for training and testing a couple of ML models on synthesized protein datasets
    • Have documented accuracy reports for various models
    • Consider other relevant information that may be nice to have in test data (more of a scientific question/goal than programming)
  • Non-Goals:
    • Deep learning models testing the datasets (would be nice to include for comparison but not essential for success)
    • We will not be determining more specific information about the protein folding other than whether it is 'folded' according to the set threshold (i.e. we won't label degree or category of folding) and the ML model's accuracy with that binary.

Proposed Solution:

The structure of this project is less developing a full pipeline to run a number of scripts/tools and produce usable data. Instead, this project serves as a means of testing machine learning models and gauging their aptitude for predicting protein folding propensity in different pHs. Data will be initialized and prepared for each model as needed and as specified for testing. Subsets of data will be created for training that will be different and/or subset from the data that we use to test each model.

Scikit-learn will be the primary ML tool used to run the data as well as to assess accuracy of prediction. F1 scores will be generated from the precision assessing functions of the tool. Repetition of testing models with various parts of the data will help to understand the weight of each feature in predicting protein folding. Keeping a record of all of the predictions made by each tool and their F1 scores with each subset of data with any feature alterations will allow results to be verifiable and to help determine which tool(s) work the best at predicting protein folding under the circumstances outlined for us by the lab providing the project.

Milestones:

Week Alec Loftus Mohammed Alsawi Erin Maley Rohan Sethi Deadlines/Events
Week 1
3/13-3/17
Read literature
Work on Design Doc
Begin slide work Prepare questions for PKH Read through code files First group meeting!
Week 2
3/20-3/24
Weekly Milestones table/slides
Make Design Document
Implementation Plan slides
Assist with Proposed Solution
Introduction Slides
Prepare questions for PKH
Implementation Plan slides
Assist with Goals & Non-Goals
Meeting w/ PKH Tuesday
Initial Presentation
Week 3
3/27-3/31
Keep Wiki/Milestones updated
Work on script to initiate model
Make sure everything's pushed by EOD Friday for Repo check
Work on script to assess accuracy Comment through .jpynb code
Prepare presentation
Work on script to train model
Work on script to train model 5-min Presentation
Repo Check #1
Week 4
4/3-4/7
Test Mohammed's accuracy code Test Rohan's training code Test Alec's initializing code Continue work on training code
Update README to include information on the ML models used
EASTER BREAK
Week 5
4/10-4/14
Prepare presentation
All pushes by EOD Friday for Repo Check #2
Check all code thus far has been commented and pushed Update wiki/weekly milestones Train using deep learning models 5-min Presentation
Repo Check #2
Week 6
4/17-4/21
Update README and comment code
Pulls for Repo Check #3
Test all code for training Continue work on deep learning models code
Begin work on Final Presentation
Comment code
Check in with PKH
Cross-team hacking
Repo Check #3
Week 7
4/24-4/28
Test Erin's deep learning code final time
Work on Presentation
Work on Final Presentation Check all code
Overview presentation work
Work on Final Presentation Final Presentation
Week 8
5/1-5/5
Ensure all pulls go through
Update the Wiki final time
Work on App Note
Update README
Work on App Note Make sure all code has been commented Final App Note
Final Project Code turned in
Clone this wiki locally