Jane Street Market Prediction - AE & MLP

AE & MLP approach to predict real-time financial market data and select the right trades to execute.

About

Jane Street hosted on Kaggle a code competition of predicting the stock market from February to August 2021 using the past high-frequency trading data. The competition involves predicting whether a trade will be profitable or not given the input. The training data provided contain 500 days of high-frequency trading data, a total of 2.4 million rows. The public leaderboard data contain 1 year of high-frequency trading data from some time before Aug 2020 and up to that. The private ranges from a random time from Summer 2020 up to August 2021. Additional information about the competition can be found on the Kaggle Competition page.

Datasets

The dataset is provided by Jane Street and contains an anonymized set of features, feature_{0...129}, representing real stock market data. Each row in the dataset represents a trading opportunity, for which you will be predicting an action value (1 to make the trade, 0 to pass on it). Each trade has an associated weight and resp, which together represents a return on the trade. The date column is an integer that represents the day of the trade, while ts_id represents a time ordering. In addition to anonymized feature values, you are provided with metadata about the features in features.csv. Additional information about the datasets can be found on the Kaggle Data Description page.

Model Overview

The solution is based on an Autoencoder and Multilayer Perceptrons (MLP). The autoencoder learns a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore insignificant data in order to minimize the noise. And the MLP predicts profitable trades.

Metrics

This competition is evaluated on a utility score. Each row in the test set represents a trading opportunity for which you will be predicting an action value, 1 to make the trade and 0 to pass on it. Each trade j has an associated weight and resp, which represents a return.

$\displaystyle p_i = \sum_{j} w_{ij} r_{ij} a_{ij},$

$\displaystyle t = \frac{\sum p_i }{\sqrt{\sum p_i^2}} * \sqrt{\frac{250}{|i|}},$

where (|i|) is the number of unique dates in the test set. The utility is then defined as:

$\displaystyle u = \min(\max(t,0), 6) \sum_i p_i.$

Contributing

Github issues and pull requests are welcome. Your feedback is much appreciated!

August 2021, Abdelghani Belgaid

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
jsmp-autoencoder-mlp-model.ipynb		jsmp-autoencoder-mlp-model.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jane Street Market Prediction - AE & MLP

About

Datasets

Model Overview

Metrics

Contributing

About

Languages

abdelghanibelgaid/Jane-Street-Market-Prediction

Folders and files

Latest commit

History

Repository files navigation

Jane Street Market Prediction - AE & MLP

About

Datasets

Model Overview

Metrics

Contributing

About

Topics

Resources

Stars

Watchers

Forks

Languages