Welcome to the Falcon 9 Landing Prediction repository, part of the IBM Data Science Professional Certificate. This project leverages machine learning to predict the reusability of the Falcon 9 rocket's first stage, aiming to reduce the cost of space launches by forecasting landing success.
- Project Overview
- Motivation
- Data Sources
- Technologies Used
- Setup Instructions
- Modeling Process
- Exploratory Data Analysis
- Results
- Dashboard Implementation
- Future Work
- Appendix
- Contributors
- License
This project simulates a scenario where Space Y, a new competitor to SpaceX, uses machine learning techniques to analyze the reusability of the Falcon 9 first stage. By predicting successful landings, Space Y aims to optimize launch costs and increase the sustainability of its space program.
The reusability of rockets is crucial in reducing the costs of space missions. By understanding when the Falcon 9 first stage successfully lands, this project can provide insights that help optimize operational decisions and cost structures for future launches. Machine learning allows us to analyze historical launch data and forecast the landing outcomes.
The dataset used in this project includes public data from SpaceX, which was obtained from:
- SpaceX API: For fetching launch data and outcomes.
- Wikipedia: Web scraping using
BeautifulSoup
to collect historical data on Falcon 9 and Falcon Heavy launches. - Additional Data: Launch dates, payload mass, orbital parameters, launch sites, and landing outcomes covering launches from 2010 to 2020.
This project utilizes the following technologies and libraries:
- Python: Main programming language.
- Pandas: Data analysis and manipulation.
- NumPy: Numerical computations.
- Scikit-learn: Machine learning library for model development.
- Matplotlib & Seaborn: For visualizing data trends and patterns.
- Plotly Dash: Used to build interactive dashboards.
- Folium: To create interactive maps for visualizing launch sites.
- Jupyter Notebook: For interactive analysis and development.
- SQLite & SQLAlchemy: For SQL queries and data management.
git clone https://github.com/yourusername/falcon9-landing-prediction-ibm-capstone.git
cd falcon9-landing-prediction-ibm-capstone
pip install -r requirements.txt
jupyter notebook
python spacex_dash_app.py
- Data Cleaning: Missing values were imputed or removed to maintain dataset integrity.
- Feature Engineering: Used
get_dummies()
for one-hot encoding of categorical variables likeOrbit
,LaunchSite
, andLandingPad
. - Scaling: All numeric columns were standardized using
StandardScaler()
to optimize model performance.
Several machine learning models were trained, tuned, and evaluated:
- Logistic Regression: Basic classifier used as a baseline.
- Support Vector Machine (SVM): Tested with various kernels for boundary classification.
- Decision Trees: Used for rule-based classification of landing outcomes.
- K-Nearest Neighbors (KNN): Applied to cluster data based on proximity.
- GridSearchCV: Hyperparameter tuning to optimize the model.
Key visualizations produced during the EDA phase included:
- Flight Number vs. Launch Site: Analyzed how launch success varied across different sites.
- Payload vs. Orbit: Identified the relationship between payload mass and orbital success.
- Launch Success Yearly Trend: Visualized trends in landing success over time.
- Success Rate by Orbit Type: Compared landing success rates across different orbits.
- Total Payload by NASA: Showcased the total payload mass for NASA-sponsored launches.
The following accuracies were achieved on the test data:
- Logistic Regression: 83.3%
- SVM: 83.3%
- Decision Trees: 88.9% (Best Performing)
- K-Nearest Neighbors (KNN): 83.3%
The Decision Tree model achieved the highest accuracy with an accuracy score of 88.9%. The confusion matrix showed a relatively high true positive rate, but a small percentage of false positives.
- Launch Site Selection: Users can filter by launch site to view success/failure ratios.
- Payload Range Slider: Interactive slider to filter launches by payload mass.
- Pie Charts: Visual representation of success/failure ratios per launch site.
- Scatter Plots: Payload vs. Launch Outcome scatter plots color-coded by booster version.
- Success Pie Charts: Showed the success rate for all launch sites.
- Payload vs. Outcome Scatter Plot: Helped identify trends in payload mass and landing success.
- Incorporate Weather Data: Introduce weather data to refine the landing prediction model.
- Advanced Modeling: Test advanced models like Random Forest and XGBoost to improve classification accuracy.
- Deployment: Deploy the prediction model using Flask or FastAPI for real-time predictions.
Relevant assets for this project include:
- SQL Queries: Used to filter and aggregate data (e.g., payload mass, landing outcomes).
- Code Snippets: Python functions for feature engineering and model evaluation.
- Charts: Visualizations produced during EDA (e.g., bar charts, scatter plots).
- Notebook Outputs: Key results from Jupyter notebooks used in the project.
- Manuel Luján Vilchez - Data Scientist and Project Lead
- IBM Data Science Capstone Project Team - Instructors and resources for guidance.
This project is licensed under the MIT License. See the LICENSE file for more information.