Skip to content

This project utilizes biological sequences to determine if the sequence is associated with a disease.

Notifications You must be signed in to change notification settings

Kshithijm1/DiseasePredictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

LSTM with Attention for Sequence Analysis

This project leverages a Long Short-Term Memory (LSTM) network enhanced with an attention mechanism to analyze biological sequences for disease prediction. It is implemented using PyTorch, and includes data visualization through a Dash web application.

Prerequisites

  • Python 3.6+
  • PyTorch
  • NumPy
  • BioPython
  • scikit-learn
  • Dash and Plotly for visualization
  • Dash Bootstrap Components

You can install the necessary libraries using pip:

pip install torch numpy biopython scikit-learn dash dash-bootstrap-components plotly

Components

  • AttentionLayer: A custom neural network layer that applies a softmax function to LSTM outputs to focus on important features.
  • LSTMWithAttention: The main model that uses an LSTM layer and the custom AttentionLayer to predict disease risk from sequence data.

Data Handling

  • load_sequences_and_labels: Loads sequences from a FASTA file and extracts labels and metadata.
  • encode_sequences: Converts nucleotide sequences into numerical format for model input.

Model Training and Validation

  • train_model: Handles the training process, including validation loss calculation and optimizer adjustments.

Visualization with Dash

  • setup_dash_app: Sets up a Dash application to visualize sequence data and display predictions using an interactive slider and Plotly graphs.

Usage

  1. Load and encode sequence data from a FASTA file.
  2. Initialize the LSTM with Attention model and configure its parameters.
  3. Train the model using the encoded data.
  4. Launch the Dash application to visualize and interact with the sequence data and predictions.

Example Usage

To run the application, ensure you have the example.fasta file in your working directory, then execute:

if __name__ == "__main__":
    main()

This will start a server for the Dash application, allowing you to interact with the sequence data and view predictions in real-time.

Debugging and Error Handling

Logging and error handling are essential for monitoring the application's health and performance. Ensure you have proper logging configurations to capture and diagnose issues.


About

This project utilizes biological sequences to determine if the sequence is associated with a disease.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages