The code is designed to do horse race betting on Betfair based on a trained neural network. Betting works as follows: There are two sides to every bet: back or lay. Each horse race has multiple horses, usually between 2 to 10 horses.
- Backing a horse means that you are betting that the horse will win.
- Laying a horse means that you are betting that a horse doesn't win.
The payoffs are as follows:
correctly backing: (stake * price - stake) * (1-fees) incorrect backing: -stake
Assuming a stake of $1, the payaoff is as follows:
- back correctly: price - 1
- back incorrectly: -1
- lay correctly: +1
- lay incorrectly: -price + 1
If everybody is absolutely certain that a horse will win, the betting price of that horse will be $1. Backing it will result in no payoff, nor will laying. If everybody is quite certain that horse will not win the race, the price is usually around $1000 on betfair.
The probabiliy that a horse will win can be infered from the price, and vice versa: - winning probability = 1 / price
Looking at the code the exact payoff is calculated as follows, based on the LTP (last traded price), the actual stake, and the fees.
The below snipped is in historic_data_processing.py
payoff_back = np.where(self.df['winner'].values,
(stake * self.df['LTP'].values - stake) * (1 - fees), # back winner
-np.ones((self.df['LTP'].values.shape)) * stake) # back loser
payoff_lay = np.where(self.df['winner'].values,
-(stake * self.df['LTP'].values - stake), # lay winner
stake * np.ones((self.df['LTP'].values.shape)) * (1 - fees)) # lay loser
The data is collected by downloading prices of each horse once per minute, starting 60 minutes before the race starts and continues to be collected until the race ends. During the race prices are collected around once every 10 seconds. Prices are last traded prices.
After the race a separate process collects the winner and writes the results to the same database.
In a separate process, the data is then transformed, so it can be fed into a neural network. The transformed data will contain the following columns:
'LTP t-0', 'LTP t-7', 'average', 'average_2d', 'back', 'countrycode', 'kurtosis', 'kurtosis_2d', 'lay', 'lay_risk', 'marketid', 'marketstarttime', 'max_2d', 'maximum', 'median', 'median_2d', 'min_2d', 'minimum', 'overrun', 'participants', 'selection_id', 'skew', 'skew_2d', 'starting_price', 'std', 'std_2d', 'winner'
Looking like that:
LTP t-0 | LTP t-7 | average | average_2d | back | countrycode | kurtosis | kurtosis_2d | lay | lay_risk | marketid | marketstarttime | max_2d | maximum | median | median_2d | min_2d | minimum | overrun | participants | selection_id | skew | skew_2d | starting_price | std | std_2d | winner |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
50.0 | 60.0 | 98.54166666666667 | 53.78402777777784 | -1.0 | GB | 10.982342637974964 | 22.51974284698778 | 0.95 | -49.0 | 1.139233105 | 2018-01-24 14:40:00 | 990.0 | 840.0 | 18.25 | 16.0 | 4.4 | 4.4 | 1.0037970937108869 | 12 | 3893687 | 3.279079367849355 | 4.385357131797931 | 68.255507883 | 237.41174229340018 | 116.51357884846351 | False |
840.0 | 670.0 | 98.54166666666667 | 53.78402777777784 | -1.0 | GB | 10.982342637974964 | 22.51974284698778 | 0.95 | -839.0 | 1.139233105 | 2018-01-24 14:40:00 | 990.0 | 840.0 | 18.25 | 16.0 | 4.4 | 4.4 | 1.0037970937108869 | 12 | 7262342 | 3.279079367849355 | 4.385357131797931 | 590.474923745 | 237.41174229340018 | 116.51357884846351 | False |
14.5 | 16.0 | 98.54166666666667 | 53.78402777777784 | -1.0 | GB | 10.982342637974964 | 22.51974284698778 | 0.95 | -13.5 | 1.139233105 | 2018-01-24 14:40:00 | 990.0 | 840.0 | 18.25 | 16.0 | 4.4 | 4.4 | 1.0037970937108869 | 12 | 7380168 | 3.279079367849355 | 4.385357131797931 | 18.664683298746816 | 237.41174229340018 | 116.51357884846351 | False |
11.0 | 12.5 | 98.54166666666667 | 53.78402777777784 | -1.0 | GB | 10.982342637974964 | 22.51974284698778 | 0.95 | -10.0 | 1.139233105 | 2018-01-24 14:40:00 | 990.0 | 840.0 | 18.25 | 16.0 | 4.4 | 4.4 | 1.0037970937108869 | 12 | 8421851 | 3.279079367849355 | 4.385357131797931 | 9.339514387501655 | 237.41174229340018 | 116.51357884846351 | False |
29.0 | 19.5 | 98.54166666666667 | 53.78402777777784 | -1.0 | GB | 10.982342637974964 | 22.51974284698778 | 0.95 | -28.0 | 1.139233105 | 2018-01-24 14:40:00 | 990.0 | 840.0 | 18.25 | 16.0 | 4.4 | 4.4 | 1.0037970937108869 | 12 | 8492784 | 3.279079367849355 | 4.385357131797931 | 32.0 | 237.41174229340018 | 116.51357884846351 | False |
5.2 | 6.0 | 98.54166666666667 | 53.78402777777784 | 3.9899999999999998 | GB | 10.982342637974964 | 22.51974284698778 | -4.2 | -4.2 | 1.139233105 | 2018-01-24 14:40:00 | 990.0 | 840.0 | 18.25 | 16.0 | 4.4 | 4.4 | 1.0037970937108869 | 12 | 8869367 | 3.279079367849355 | 4.385357131797931 | 4.4 | 237.41174229340018 | 116.51357884846351 | True |
4.4 | 4.8 | 98.54166666666667 | 53.78402777777784 | -1.0 | GB | 10.982342637974964 | 22.51974284698778 | 0.95 | -3.4000000000000004 | 1.139233105 | 2018-01-24 14:40:00 | 990.0 | 840.0 | 18.25 | 16.0 | 4.4 | 4.4 | 1.0037970937108869 | 12 | 9229409 | 3.279079367849355 | 4.385357131797931 | 4.6 | 237.41174229340018 | 116.51357884846351 | False |
32.0 | 32.0 | 98.54166666666667 | 53.78402777777784 | -1.0 | GB | 10.982342637974964 | 22.51974284698778 | 0.95 | -31.0 | 1.139233105 | 2018-01-24 14:40:00 | 990.0 | 840.0 | 18.25 | 16.0 | 4.4 | 4.4 | 1.0037970937108869 | 12 | 10839655 | 3.279079367849355 | 4.385357131797931 | 40.0 | 237.41174229340018 | 116.51357884846351 | False |
22.0 | 19.0 | 98.54166666666667 | 53.78402777777784 | -1.0 | GB | 10.982342637974964 | 22.51974284698778 | 0.95 | -21.0 | 1.139233105 | 2018-01-24 14:40:00 | 990.0 | 840.0 | 18.25 | 16.0 | 4.4 | 4.4 | 1.0037970937108869 | 12 | 11321256 | 3.279079367849355 | 4.385357131797931 | 20.893840768481603 | 237.41174229340018 | 116.51357884846351 | False |
8.4 | 6.4 | 98.54166666666667 | 53.78402777777784 | -1.0 | GB | 10.982342637974964 | 22.51974284698778 | 0.95 | -7.4 | 1.139233105 | 2018-01-24 14:40:00 | 990.0 | 840.0 | 18.25 | 16.0 | 4.4 | 4.4 | 1.0037970937108869 | 12 | 11688035 | 3.279079367849355 | 4.385357131797931 | 7.974782176370879 | 237.41174229340018 | 116.51357884846351 | False |
6.0 | 6.0 | 98.54166666666667 | 53.78402777777784 | -1.0 | GB | 10.982342637974964 | 22.51974284698778 | 0.95 | -5.0 | 1.139233105 | 2018-01-24 14:40:00 | 990.0 | 840.0 | 18.25 | 16.0 | 4.4 | 4.4 | 1.0037970937108869 | 12 | 12232392 | 3.279079367849355 | 4.385357131797931 | 6.6 | 237.41174229340018 | 116.51357884846351 | False |
160.0 | 250.0 | 98.54166666666667 | 53.78402777777784 | -1.0 | GB | 10.982342637974964 | 22.51974284698778 | 0.95 | -159.0 | 1.139233105 | 2018-01-24 14:40:00 | 990.0 | 840.0 | 18.25 | 16.0 | 4.4 | 4.4 | 1.0037970937108869 | 12 | 14838121 | 3.279079367849355 | 4.385357131797931 | 202.39535985152256 | 237.41174229340018 | 116.51357884846351 | False |
9.4 | 12.0 | 8.35 | 7.634861111111109 | -1.0 | GB | 3.8745365787714015 | 1.3322044077460662 | 0.95 | -8.4 | 1.139268720 | 2018-01-25 21:00:00 | 23.0 | 21.0 | 5.8 | 5.3 | 3.55 | 4.0 | 1.0081529125718112 | 6 | 8575987 | 1.9405721670660565 | 1.4417920045836903 | 7.0 | 6.54209446584196 | 4.486640644218091 | False |
4.0 | 4.1 | 8.35 | 7.634861111111109 | -1.0 | GB | 3.8745365787714015 | 1.3322044077460662 | 0.95 | -3.0 | 1.139268720 | 2018-01-25 21:00:00 | 23.0 | 21.0 | 5.8 | 5.3 | 3.55 | 4.0 | 1.0081529125718112 | 6 | 8706065 | 1.9405721670660565 | 1.4417920045836903 | 4.3 | 6.54209446584196 | 4.486640644218091 | False |
7.0 | 5.4 | 8.35 | 7.634861111111109 | -1.0 | GB | 3.8745365787714015 | 1.3322044077460662 | 0.95 | -6.0 | 1.139268720 | 2018-01-25 21:00:00 | 23.0 | 21.0 | 5.8 | 5.3 | 3.55 | 4.0 | 1.0081529125718112 | 6 | 10509488 | 1.9405721670660565 | 1.4417920045836903 | 7.4 | 6.54209446584196 | 4.486640644218091 | False |
4.1 | 4.3 | 8.35 | 7.634861111111109 | 2.9449999999999994 | GB | 3.8745365787714015 | 1.3322044077460662 | -3.0999999999999996 | -3.0999999999999996 | 1.139268720 | 2018-01-25 21:00:00 | 23.0 | 21.0 | 5.8 | 5.3 | 3.55 | 4.0 | 1.0081529125718112 | 6 | 11024653 | 1.9405721670660565 | 1.4417920045836903 | 4.280494073 | 6.54209446584196 | 4.486640644218091 | True |
4.6 | 4.7 | 8.35 | 7.634861111111109 | -1.0 | GB | 3.8745365787714015 | 1.3322044077460662 | 0.95 | -3.5999999999999996 | 1.139268720 | 2018-01-25 21:00:00 | 23.0 | 21.0 | 5.8 | 5.3 | 3.55 | 4.0 | 1.0081529125718112 | 6 | 11180317 | 1.9405721670660565 | 1.4417920045836903 | 4.562548612717727 | 6.54209446584196 | 4.486640644218091 | False |
21.0 | 21.0 | 8.35 | 7.634861111111109 | -1.0 | GB | 3.8745365787714015 | 1.3322044077460662 | 0.95 | -20.0 | 1.139268720 | 2018-01-25 21:00:00 | 23.0 | 21.0 | 5.8 | 5.3 | 3.55 | 4.0 | 1.0081529125718112 | 6 | 12653024 | 1.9405721670660565 | 1.4417920045836903 | 22.0 | 6.54209446584196 | 4.486640644218091 | False |
26.0 | 34.0 | 24.12 | 24.321888888888875 | -1.0 | IE | 11.299591151509034 | 3.423915460281055 | 0.95 | -25.0 | 1.139296062 | 2018-01-25 15:00:00 | 110.0 | 110.0 | 17.5 | 15.5 | 5.3 | 5.3 | 1.004864866781696 | 15 | 781222 | 3.2000719722913247 | 1.8176997877961063 | 25.292604111 | 25.129271491913286 | 19.541971566679 | False |
16.5 | 8.6 | 24.12 | 24.321888888888875 | -1.0 | IE | 11.299591151509034 | 3.423915460281055 | 0.95 | -15.5 | 1.139296062 | 2018-01-25 15:00:00 | 110.0 | 110.0 | 17.5 | 15.5 | 5.3 | 5.3 | 1.004864866781696 | 15 | 5660962 | 3.2000719722913247 | 1.8176997877961063 | 16.687364853692706 | 25.129271491913286 | 19.541971566679 | False |
The neural network input and output can then be defined as follows, where strategy is either 'lay' or 'back', depicting the payoff of the respective strategy. In other words, we are trying to predict the payoff of taking a bet on a horse, given the values defined under self.X, starting with the LTP t-0, which is the price at which we entered the bet, metrics of the distribution on how the price developed the hour before (min max, media, std skew etc).
self.X = df[['LTP t-0', 'average', 'minimum', 'maximum', 'median', 'std', 'participants', 'skew', 'kurtosis','overrun']].values
self.Y = df[['act', strategy]].values
It's important to note that we are not trying to predict which horse wins or which one doesn't win, rather we are trying to see inefficiencies in pricing of the respective probabilities. We already know that the horse with the highest odds (lowest price) is the one most likely to win. But simply backing that horse will not yield any profit (or wil it?). This is what we're trying to find out by directly creating a loss function that calculates the payoff based on the price we enter the bet.
The following run will start training the neural network on a betting strategy:
app.py ts train STRATEGY CLASS COUNTRYCODES [--batchsize=<>] [--from_year=<>] [--to_year=<>] [--localhost]
app.py ts train back FlyingSpider GB,IE --from_year 2015 --to_year 2019 --localhost --batchsize=100
The code consists of the follwoing packages * Collecting data from betfair and matchbook (below cronjobs need to run)
- Analyzing the pricing data of horse racing through neural networks and creating a model to make betting recommandations for backing or laying
- Executing bets according to the trained model
- Flask web interface that shows logging activity, graphical pnl overview and statistical analysis of past bets
You can create a venv with the environment.yml file as follows:
- Download anaconda 64 python 3
- conda create env -f environment.yml -n horse_racing (or simply run update_venv.bat). On Linux you need to use conda env create -f environment.yml
- The env will be in anaconda/envs/horse_racing
The application is controlled over app.py. The pnl overview web server is launched over webserver.py (but better use webserver.wsgi).
Usage: app.py ts train STRATEGY CLASS COUNTRYCODES [--batchsize=<>] [--from_year=<>] [--to_year=<>] [--localhost] app.py ts backtest STRATEGY CLASS COUNTRYCODES [MODEL_PATH] [--from_year=<>] [--to_year=<>] [--localhost] app.py collect_prices app.py bet [--armed] [--sandbox_key] [--config=<>] app.py update_unfilled_orders [--armed] [--config=<>] app.py collect_results app.py evaluate_pnl [--overwrite_calculated_pnls] [--config=<>] app.py email_summary app.py upload_tarball FILE [DESTINATION] app.py map_reduce SOURCE DESTINATION CLASS [--localhost] [--use_archive] app.py propagate_race_results_to_price_scrape Exmaple app.py ts backtest lay FlyingSpiderBookie GB,IE --from_year 2018 --to_year 2018` app.py ts train lay FlyingSpider GB,IE,US,NZ --localhost --from_year 2015 --to_year 2017 app.py ts backtest lay FlyingSpider GB,IE,US,NZ --localhost --from_year 2016 --to_year 2016 app.py propagate_race_results_to_price_scrape adds a winner and losers column to each price in price_scrape app.py map_reduce price_scrape price_scrape_enriched DEBookies --localhost
Cronjobs that need to be set up: * * * * * sh app.sh collect_prices * * * * * sh app.sh bet * * * * * sh app.sh update_unfilled_orders --armed 5 * * * * sh app.sh collect_results 6 * * * * sh app.sh evaluate_pnl 7 22 * * * sh app.sh email_summary 9 4 * * * sh app.sh propagate_race_results_to_price_scrape 5 5 * * * sh app.sh map_reduce price_scrape price_scrape_enriched_bookies DEBookies
All data is collected onto a mongodb server which can be set up over config.ini.