AE & MLP approach to predict real-time financial market data and select the right trades to execute.
Jane Street hosted on Kaggle a code competition of predicting the stock market from February to August 2021 using the past high-frequency trading data. The competition involves predicting whether a trade will be profitable or not given the input. The training data provided contain 500 days of high-frequency trading data, a total of 2.4 million rows. The public leaderboard data contain 1 year of high-frequency trading data from some time before Aug 2020 and up to that. The private ranges from a random time from Summer 2020 up to August 2021. Additional information about the competition can be found on the Kaggle Competition page.
The dataset is provided by Jane Street and contains an anonymized set of features, feature_{0...129}, representing real stock market data. Each row in the dataset represents a trading opportunity, for which you will be predicting an action
value (1 to make the trade, 0 to pass on it). Each trade has an associated weight
and resp
, which together represents a return on the trade. The date column is an integer that represents the day of the trade, while ts_id represents a time ordering. In addition to anonymized feature values, you are provided with metadata about the features in features.csv. Additional information about the datasets can be found on the Kaggle Data Description page.
The solution is based on an Autoencoder and Multilayer Perceptrons (MLP). The autoencoder learns a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore insignificant data in order to minimize the noise. And the MLP predicts profitable trades.
This competition is evaluated on a utility score. Each row in the test set represents a trading opportunity for which you will be predicting an action
value, 1 to make the trade and 0 to pass on it. Each trade j has an associated weight
and resp
, which represents a return.
where (|i|) is the number of unique dates in the test set. The utility is then defined as:
Github issues and pull requests are welcome. Your feedback is much appreciated!
August 2021, Abdelghani Belgaid