This research project analyzes the relationship between environmental factors and diarrheal disease incidence across four major divisions in Bangladesh. Using machine learning models and time series analysis, the study explores how environmental variables influence disease patterns and develops predictive models for public health applications.
- Multi-regional data analysis covering Rajshahi, Khulna, Dhaka, and Chattogram divisions
- Interactive visualization of disease patterns and environmental correlations
- Time series decomposition and seasonal trend analysis
- Predictive modeling using various machine learning algorithms
- Interactive web dashboard for data exploration and model predictions
- Data Analysis: Python, Pandas, NumPy
- Machine Learning: Scikit-learn, Random Forest, Gradient Boosting
- Visualization: Plotly, Seaborn, Matplotlib
- Web Dashboard: Streamlit
- Time Series Analysis: Statsmodels
The study utilizes data from four divisions in Bangladesh, including:
- Daily diarrheal disease cases
- Environmental parameters:
- Maximum temperature
- Minimum temperature
- Humidity
- Precipitation
-
Data Preprocessing
- Outlier detection and handling
- Missing value treatment
- Feature engineering
-
Exploratory Data Analysis
- Regional comparison of disease patterns
- Seasonal trend analysis
- Environmental factor correlations
-
Predictive Modeling
- Implementation of multiple ML algorithms
- Model performance comparison
- Feature importance analysis
-
Interactive Dashboard
- Real-time data visualization
- Model prediction interface
- Time series analysis tools
Python 3.8+
pip install -r requirements.txt
streamlit run app.py
diarrhea-env-analysis/
│
├── app.py # Streamlit dashboard
├── requirements.txt # Project dependencies
├── notebooks/
│ └── analysis.ipynb # Research analysis notebook
├── datasets/ # Data directory
│ └── data.csv # Data documentation
└── README.md # Project documentation
- Identified lack of correlations between environmental factors and disease incidence
- Developed predictive models with R² scores ranging from 0.2 to 0.5
- Discovered few seasonal patterns in disease occurrence
- Quantified the relative importance of different environmental factors
- Integration of additional environmental parameters
- Extension of analysis to other regions
- Implementation of advanced time series models
- Development of early warning systems
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
For any queries regarding this research, please open an issue in this repository.
- Research collaborators and advisors