Parkinson's disease (PD) is a neurodegenerative disorder that affects movement control. This project leverages machine learning techniques to predict the likelihood of an individual having Parkinson's disease based on their medical features. The model is trained on a dataset containing medical records, and a Streamlit app provides a user-friendly interface to interact with the prediction model. 🧬✨
🔗 Parkinson's Disease Prediction
This application uses a machine learning model to predict whether an individual has Parkinson's disease based on various vocal feature inputs. 🎤🔊 The app allows users to interactively input medical data and receive a prediction.
This repository contains:
- A trained machine learning model for predicting Parkinson's disease.
- A Streamlit app that provides an interactive interface for predictions.
The project is organized into different components, including model training, data processing, and the Streamlit web app. Here’s the directory structure:
├── model_files/ # Folder containing model files and other relevant files
│ ├── parkinson_model.pkl # Saved trained model in pickle format
│ ├── parkinson_model.sav # Another format of the trained model
│ ├── pca.pkl # Principal Component Analysis (PCA) model
│ ├── scaler.pkl # StandardScaler model used during training
|
├── data/ # Folder for dataset and related files
│ └── parkinsons.data # Dataset used for model training
|
├── requirements.txt # Python dependencies required to run the project
└── oldrequirements1.txt # Old version of requirements file (if needed)
|
├── app8.py # Streamlit app for interactive prediction
├── dv_cp_.py # Helper functions and data preprocessing code
|
| etc...
The following Machine Learning models were trained and evaluated:
1️⃣ Logistic Regression
2️⃣ Random Forest Classifier
3️⃣ Decision Tree Classifier
4️⃣ Support Vector Machine Classifier
5️⃣ Naive Bayes Classifier
6️⃣ K Nearest Neighbor Classifier
Data Set Characteristics | Multivariate |
---|---|
Number of Instances | 197 |
Area | Life |
Attribute Characteristics | Real |
Number of Attributes | 23 |
Date Donated | 2008-06-26 |
Associated Task | Classification |
Missing Values? | N/A |
Attribute | Meaning |
---|---|
name | ASCII subject name and recording number |
MDVP:Fo(Hz) | Average vocal fundamental frequency |
MDVP:Fhi(Hz) | Maximum vocal fundamental frequency |
MDVP:Flo(Hz) | Minimum vocal fundamental frequency |
MDVP:Jitter(%) | Measure of variation in fundamental frequency |
MDVP:Jitter(Abs) | Measure of variation in fundamental frequency |
MDVP:RAP | Measure of variation in fundamental frequency |
MDVP:PPQ | Measure of variation in fundamental frequency |
Jitter:DDP | Measure of variation in fundamental frequency |
MDVP:Shimmer | Measure of variation in amplitude |
MDVP:Shimmer(dB) | Measure of variation in amplitude |
Shimmer:APQ3 | Measure of variation in amplitude |
Shimmer:APQ5 | Measure of variation in amplitude |
MDVP:APQ | Measure of variation in amplitude |
Shimmer:DDA | Measure of variation in amplitude |
NHR | Measure of ratio of noise to tonal components in the voice |
HNR | Measure of ratio of noise to tonal components in the voice |
status(Target variable) | Health status of the subject (one) - Parkinson's, (zero) - healthy |
RPDE | Non-linear dynamical complexity measure |
D2 | Non-linear dynamical complexity measure |
DFA | Signal fractal scaling exponent |
spread1 | Non-linear measure of fundamental frequency variation |
spread2 | Non-linear measure of fundamental frequency variation |
PPE | Non-linear measure of fundamental frequency variation |
git clone https://github.com/your-username/parkinson-disease-prediction.git
cd parkinson-disease-prediction
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
pip install -r requirements.txt
The model has been trained using a dataset of medical features of individuals, which can be found in the file parkinsons.data
. The training process involves the following steps:
- Preprocessing: Data cleaning, handling missing values, and scaling features.
- Model Selection: An appropriate machine learning model is trained on the data.
- PCA Transformation: Principal Component Analysis (PCA) is applied to reduce dimensionality.
- Model Saving: The final trained model and other relevant artifacts like scaler and PCA are saved as pickle files (
.pkl
,.sav
).
You can use these pre-trained models to predict Parkinson's disease on new data by using the Streamlit app.
To interact with the trained model and make predictions, we have built a simple Streamlit web app.
-
Navigate to the project directory.
-
Run the following command:
streamlit run app8.py
-
A web browser will automatically open the app, or you can access it at
http://localhost:8501
in your browser.
- Input medical features related to the patient.
- Predict if the individual is likely to have Parkinson's disease.
- Visualize the prediction result with features importance.
parkinson_model.pkl
: The trained machine learning model saved usingpickle
.parkinson_model.sav
: An alternate format of the trained model.pca.pkl
: Principal Component Analysis model for dimensionality reduction.scaler.pkl
: Scaler model used to normalize input data before feeding it into the model.
We welcome contributions to improve the project! If you'd like to contribute, feel free to:
- Fork the repository.
- Create a new branch for your changes.
- Submit a pull request.
- Dataset: Parkinson's Disease Dataset
- Libraries used:
scikit-learn
,streamlit
,pandas
, andmatplotlib
. - Special thanks to Streamlit Cloud
This project was developed by a group of 4 students from VIT Pune, under the CSAI-B branch.
Roll Number | Official Name |
---|---|
33 | Shrey Santosh Rupnavar |
37 | Salitri Atharva Akhil |
60 | Tanishq Sudhir Thuse |
61 | Tripti Prakash Mirani |