In machine learning, there are many algorithms to select for a model. What algorithm to choose can be filtered by analysing the problem and the data. After the filtration process, we are still often left with multiple algorithms to select for a model. In this case, if feasible, we should train and tune multiple algorithms to find the best algorithm. In this report, we dive into the practice of using and tuning different models. The report will go through each step involved in solving a machine learning problem:
- We will define a problem.
- We will tweak our data to fit the problem.
- We will define and tune the models.
- We will compare and evaluate the models.
The three models to be used in the report are:
- XGBoost
- Logistic Regression
- Neural network
The problem presented in the report is from the Jane Street Market Prediction competition. The competition involves predicting whether a trade will be profitable or not given the input.
Data used to train/validation/test the model contains the date, weight, four resp (return) columns, and 130 feature columns. Each row represents a trade, and different resp values represent the different returns. For simplicity, we will ignore the multiple resp value and weights. We will use one of the resp columns to evaluate if the trade should be taken or not. If the resp is below 0, we will label 0, and 1 if resp above 1. The goal of our model will be to predict if a given trade should be 0 or 1. In other words, this is a binary classification problem.
In the end, we will compare which model can predict profitable trades more while minimizing losing trades. The model that does it the best will classify as a better model.
For a full detailed report on the experiment, please check the report.pdf or the jupyter notebook.
Thank you for reading!