This project explores the classification of flight delays using classic machine learning models, aiming to predict whether a flight will be delayed based on various factors such as departure time, airline, and airport data.
- Developed as part of final project assessment for ELCE 455: Machine Learning with Python (Fall 2024) course with instructor Professor Dr. Amin Zollanvari
- Dataset: US domestic flights (January 2019)
- Best performing model: Random Forest (92% accuracy, 0.77 F1-score for delayed flights)
- Implemented models: Logistic Regression, Decision Trees, Random Forest, KNN, AdaBoost
- Data Preprocessing: Handling missing values, encoding categorical variables, and feature scaling.
- Feature Engineering: Adding temporal features and spatial-temporal interactions for improved predictive power.
- Model Training & Selection: Evaluated multiple ML models using cross-validation and grid search.
- Evaluation Metrics: Accuracy, Precision, Recall, F1-score, and ROC-AUC.
- Random Forest outperformed other models, effectively handling non-linear feature interactions.
- AdaBoost performed well but had slightly lower recall for delayed flights.
- Feature engineering, including departure delay indicators and time-based binning, significantly improved model performance.
- Future improvements: Incorporating weather data and deep learning models like LSTMs.
- Python 3.9+
- Required libraries:
pip install pandas numpy scikit-learn matplotlib seaborn
- Clone the repository:
git clone https://github.com/tvran/Forte-stt.git cd Forte-stt
- Run the preprocessing and model training:
python main.py
📁 Forte-stt/
│── Predicting_the_Delay_of_Flights_Turan.ipynb # Project notebook
│── README.md # Project documentation
│── dictionaries.py # Loaded dataset
📌 Institution: Nazarbayev University, School of Engineering and Digital Sciences, Department of Electrical and Computer Engineering
📅 Date: November 24, 2024