Rolling LSTM modelling framework for stock data prediction using candlestick data, technical indicators, and macroeconomic indicator.
-
Install python >= 3.9.* and latest pip. Preferably using miniconda.
-
Install required padckages using pip:
pip install -r requirements.txt
-
Download data and place it to
data/raw
. Otherwise, sample data is already in the repository. -
Run
main.py
-
Example financial time series data: (NDX OHLCV candlestick) [yahoo finance]
-
Initial Claims time series data [Initial Claims - Federal Reserve Bank of St. Louis]
1. src.data.preprocessing
Input: raw datasets (csv): config[raw]
-> data/raw
This module preprocesses and joins datasets:
OHLCV, Initial Claims (ICSA), Technical Indicators, and transforms the target variable.
Output: joined dataset (pkl, csv) config[prep][JoinedDfPkl]
-> data/input/joined.pkl
2. src.data.windowSplit
Input: joined dataset (pkl) config[prep][JoinedDfPkl]
-> data/input/joined.pkl
This module splits the dataset into train and test windows.
There are 3 parameters to consider: lookback, train-window, test-window.
First of all, data is divided into train-test windows in a way that training period of next window moves
over a test period of the previous window (see diagram #1 below).
Moreover, each train and test period is also handled using the rolling window approach (see diagram #2 below).
This approach utilizes lookback period which allows the model to train on small batches of recent data.
At the end, the code with default config should generate arrays with following dimensions:
Train window (features, targets): (N, train, look_back, n_feat), (N, train, look_forward, n_targets)
Test window (features, targets): (N, test, look_back, n_feat), (N, test, look_forward, n_targets)
where:
- N = resulting number of train-test windows
- look_back = look-back period for feature matrix in each train window
- look_forward = how many days ahead should the model predict the target (default = 1) (target period in diagram above)
- n_feat, n_targets = number of features / targets in joined dataset
- train, test = train, test periods
Default settings example:
Train window dimensions (features, targets): (70, 504, 63, 19), (70, 504, 1, 1)
Test window dimensions (features, targets): (70, 126, 63, 19), (70, 126, 1, 1)
Output: window split dictionary (pkl): config[prep][WindowSplitDict]
-> data/input/window_split.pkl
3. src.model.modelFitPredict
Input: window split dictionary (pkl): config[prep][WindowSplitDict]
-> data/input/window_split.pkl
The module utilizes keras Sequential model:
builds the framework, trains it on window data, and generates predictions using hyperparameters from config.ini.
Output:
- numpy array of predictions (pkl):
config[prep][PredictionsArray]
->data/output/latest_preds.pkl
- data-to-evaluate (csv, pkl):
data/output/model_eval_data_<timestamp>.pkl
- model configuration (json):
reports/model_config_<timestamp>.json
3.1. src.model.performanceMetrics
Input: data-to-evaluate (pkl): data/output/model_eval_data_<timestamp>.pkl
Calculates Equity Line and performance metrics:
- Annualized Return Ratio,
- Annualized Standard Deviation,
- Information Ratio,
- Maximum Loss Duration
Output:
- Performance metrics dictionary (json):
reports/performance_metrics_<timestamp>.json
- Equity Line array (pkl):
data/output/eq_line_<timestamp>.pkl
4. src.visualization.plotResults
Input:
- data-to-evaluate (pkl):
data/output/model_eval_data_<timestamp>.pkl
- window split dictionary (pkl):
config[prep][WindowSplitDict]
->data/input/window_split.pkl
- model configuration (json):
reports/model_config_<timestamp>.json
- Performance metrics dictionary (json):
reports/performance_metrics_<timestamp>.json
- Equity Line array (pkl):
data/output/eq_line_<timestamp>.pkl
Visualizes results.
Includes information about model configuration, comparison between real vs predicted data, and performance metrics.
Output:
- Equity Line plot (png):
reports/figures/equity_line_<timestamp>.png
- Predictions histogram (png):
reports/figures/predictions_histogram_<timestamp>.png
Further improvements to be included:
-
Averaging the results from many runtimes (random seed cannot be currently set due to the large amount of stochastic processes)
-
Hyper-param tuning between windows
-
Real time approach
MIT License | Copyright (c) 2021 Jan Androsiuk