Forecasting Gold Price using Gradient Boosting Machines (XGBOOST), Decision Trees and Random Forests and ARIMA
- Data Collection: Gather historical gold prices and relevant external factors.
- Preprocessing: Handle missing values, convert dates, and engineer features (e.g., lags, rolling statistics).
- EDA: Visualize trends, seasonality, and correlations. Test for stationarity.
- Data Split: Divide into train-test sets, ensuring temporal order.
-
Decision Trees:
- Train a Decision Tree Regressor.
- Tune parameters (e.g., depth).
- Evaluate using MAE, RMSE, feature importance.
-
Random Forests:
- Train a Random Forest Regressor.
- Optimize parameters (e.g., number of trees).
- Assess performance and feature importance.
- Feature Engineering: Enhance features with interaction terms, advanced rolling stats.
- Model Training:
- Train XGBoost with hyperparameter tuning (e.g., learning rate, depth).
- Use cross-validation.
- Evaluate using MAE, RMSE, and SHAP values for feature importance.
- Preprocessing: Ensure stationarity, possibly by differencing.
- Model Training:
- Fit ARIMA using identified lags (p, d, q).
- Diagnose residuals for model fit.
- Evaluate with MAE, RMSE, AIC/BIC.
- Compare Models: Rank based on MAE, RMSE, and complexity.
- Ensemble Option: Consider combining models for improved accuracy.
- Final Selection: Choose the best model for deployment, based on performance and practicality.
- Reporting: Summarize results with visualizations and key findings.