Skip to content

A fully functional automated horse racing trading system for betfair in python. Collects data, uses a neural network to analyse the data, makes trading recommendations and places the bets before races start.

Notifications You must be signed in to change notification settings

dickreuter/betfair-horse-racing

Repository files navigation

Betfair horse racing

The code is designed to do horse race betting on Betfair based on a trained neural network. Betting works as follows: There are two sides to every bet: back or lay. Each horse race has multiple horses, usually between 2 to 10 horses.

Overview

  • Backing a horse means that you are betting that the horse will win.
  • Laying a horse means that you are betting that a horse doesn't win.

The payoffs are as follows:

correctly backing: (stake * price - stake) * (1-fees) incorrect backing: -stake

Assuming a stake of $1, the payaoff is as follows:

  • back correctly: price - 1
  • back incorrectly: -1
  • lay correctly: +1
  • lay incorrectly: -price + 1

If everybody is absolutely certain that a horse will win, the betting price of that horse will be $1. Backing it will result in no payoff, nor will laying. If everybody is quite certain that horse will not win the race, the price is usually around $1000 on betfair.

The probabiliy that a horse will win can be infered from the price, and vice versa: - winning probability = 1 / price

Looking at the code the exact payoff is calculated as follows, based on the LTP (last traded price), the actual stake, and the fees. The below snipped is in historic_data_processing.py

payoff_back = np.where(self.df['winner'].values,
                       (stake * self.df['LTP'].values - stake) * (1 - fees),  # back winner
                       -np.ones((self.df['LTP'].values.shape)) * stake)  # back loser

payoff_lay = np.where(self.df['winner'].values,
                      -(stake * self.df['LTP'].values - stake),  # lay winner
                      stake * np.ones((self.df['LTP'].values.shape)) * (1 - fees))  # lay loser

The Data

The data is collected by downloading prices of each horse once per minute, starting 60 minutes before the race starts and continues to be collected until the race ends. During the race prices are collected around once every 10 seconds. Prices are last traded prices.

After the race a separate process collects the winner and writes the results to the same database.

In a separate process, the data is then transformed, so it can be fed into a neural network. The transformed data will contain the following columns:

'LTP t-0', 'LTP t-7', 'average', 'average_2d', 'back', 'countrycode', 'kurtosis', 'kurtosis_2d', 'lay', 'lay_risk', 'marketid', 'marketstarttime', 'max_2d', 'maximum', 'median', 'median_2d', 'min_2d', 'minimum', 'overrun', 'participants', 'selection_id', 'skew', 'skew_2d', 'starting_price', 'std', 'std_2d', 'winner'

Looking like that:

LTP t-0 LTP t-7 average average_2d back countrycode kurtosis kurtosis_2d lay lay_risk marketid marketstarttime max_2d maximum median median_2d min_2d minimum overrun participants selection_id skew skew_2d starting_price std std_2d winner
50.0 60.0 98.54166666666667 53.78402777777784 -1.0 GB 10.982342637974964 22.51974284698778 0.95 -49.0 1.139233105 2018-01-24 14:40:00 990.0 840.0 18.25 16.0 4.4 4.4 1.0037970937108869 12 3893687 3.279079367849355 4.385357131797931 68.255507883 237.41174229340018 116.51357884846351 False
840.0 670.0 98.54166666666667 53.78402777777784 -1.0 GB 10.982342637974964 22.51974284698778 0.95 -839.0 1.139233105 2018-01-24 14:40:00 990.0 840.0 18.25 16.0 4.4 4.4 1.0037970937108869 12 7262342 3.279079367849355 4.385357131797931 590.474923745 237.41174229340018 116.51357884846351 False
14.5 16.0 98.54166666666667 53.78402777777784 -1.0 GB 10.982342637974964 22.51974284698778 0.95 -13.5 1.139233105 2018-01-24 14:40:00 990.0 840.0 18.25 16.0 4.4 4.4 1.0037970937108869 12 7380168 3.279079367849355 4.385357131797931 18.664683298746816 237.41174229340018 116.51357884846351 False
11.0 12.5 98.54166666666667 53.78402777777784 -1.0 GB 10.982342637974964 22.51974284698778 0.95 -10.0 1.139233105 2018-01-24 14:40:00 990.0 840.0 18.25 16.0 4.4 4.4 1.0037970937108869 12 8421851 3.279079367849355 4.385357131797931 9.339514387501655 237.41174229340018 116.51357884846351 False
29.0 19.5 98.54166666666667 53.78402777777784 -1.0 GB 10.982342637974964 22.51974284698778 0.95 -28.0 1.139233105 2018-01-24 14:40:00 990.0 840.0 18.25 16.0 4.4 4.4 1.0037970937108869 12 8492784 3.279079367849355 4.385357131797931 32.0 237.41174229340018 116.51357884846351 False
5.2 6.0 98.54166666666667 53.78402777777784 3.9899999999999998 GB 10.982342637974964 22.51974284698778 -4.2 -4.2 1.139233105 2018-01-24 14:40:00 990.0 840.0 18.25 16.0 4.4 4.4 1.0037970937108869 12 8869367 3.279079367849355 4.385357131797931 4.4 237.41174229340018 116.51357884846351 True
4.4 4.8 98.54166666666667 53.78402777777784 -1.0 GB 10.982342637974964 22.51974284698778 0.95 -3.4000000000000004 1.139233105 2018-01-24 14:40:00 990.0 840.0 18.25 16.0 4.4 4.4 1.0037970937108869 12 9229409 3.279079367849355 4.385357131797931 4.6 237.41174229340018 116.51357884846351 False
32.0 32.0 98.54166666666667 53.78402777777784 -1.0 GB 10.982342637974964 22.51974284698778 0.95 -31.0 1.139233105 2018-01-24 14:40:00 990.0 840.0 18.25 16.0 4.4 4.4 1.0037970937108869 12 10839655 3.279079367849355 4.385357131797931 40.0 237.41174229340018 116.51357884846351 False
22.0 19.0 98.54166666666667 53.78402777777784 -1.0 GB 10.982342637974964 22.51974284698778 0.95 -21.0 1.139233105 2018-01-24 14:40:00 990.0 840.0 18.25 16.0 4.4 4.4 1.0037970937108869 12 11321256 3.279079367849355 4.385357131797931 20.893840768481603 237.41174229340018 116.51357884846351 False
8.4 6.4 98.54166666666667 53.78402777777784 -1.0 GB 10.982342637974964 22.51974284698778 0.95 -7.4 1.139233105 2018-01-24 14:40:00 990.0 840.0 18.25 16.0 4.4 4.4 1.0037970937108869 12 11688035 3.279079367849355 4.385357131797931 7.974782176370879 237.41174229340018 116.51357884846351 False
6.0 6.0 98.54166666666667 53.78402777777784 -1.0 GB 10.982342637974964 22.51974284698778 0.95 -5.0 1.139233105 2018-01-24 14:40:00 990.0 840.0 18.25 16.0 4.4 4.4 1.0037970937108869 12 12232392 3.279079367849355 4.385357131797931 6.6 237.41174229340018 116.51357884846351 False
160.0 250.0 98.54166666666667 53.78402777777784 -1.0 GB 10.982342637974964 22.51974284698778 0.95 -159.0 1.139233105 2018-01-24 14:40:00 990.0 840.0 18.25 16.0 4.4 4.4 1.0037970937108869 12 14838121 3.279079367849355 4.385357131797931 202.39535985152256 237.41174229340018 116.51357884846351 False
9.4 12.0 8.35 7.634861111111109 -1.0 GB 3.8745365787714015 1.3322044077460662 0.95 -8.4 1.139268720 2018-01-25 21:00:00 23.0 21.0 5.8 5.3 3.55 4.0 1.0081529125718112 6 8575987 1.9405721670660565 1.4417920045836903 7.0 6.54209446584196 4.486640644218091 False
4.0 4.1 8.35 7.634861111111109 -1.0 GB 3.8745365787714015 1.3322044077460662 0.95 -3.0 1.139268720 2018-01-25 21:00:00 23.0 21.0 5.8 5.3 3.55 4.0 1.0081529125718112 6 8706065 1.9405721670660565 1.4417920045836903 4.3 6.54209446584196 4.486640644218091 False
7.0 5.4 8.35 7.634861111111109 -1.0 GB 3.8745365787714015 1.3322044077460662 0.95 -6.0 1.139268720 2018-01-25 21:00:00 23.0 21.0 5.8 5.3 3.55 4.0 1.0081529125718112 6 10509488 1.9405721670660565 1.4417920045836903 7.4 6.54209446584196 4.486640644218091 False
4.1 4.3 8.35 7.634861111111109 2.9449999999999994 GB 3.8745365787714015 1.3322044077460662 -3.0999999999999996 -3.0999999999999996 1.139268720 2018-01-25 21:00:00 23.0 21.0 5.8 5.3 3.55 4.0 1.0081529125718112 6 11024653 1.9405721670660565 1.4417920045836903 4.280494073 6.54209446584196 4.486640644218091 True
4.6 4.7 8.35 7.634861111111109 -1.0 GB 3.8745365787714015 1.3322044077460662 0.95 -3.5999999999999996 1.139268720 2018-01-25 21:00:00 23.0 21.0 5.8 5.3 3.55 4.0 1.0081529125718112 6 11180317 1.9405721670660565 1.4417920045836903 4.562548612717727 6.54209446584196 4.486640644218091 False
21.0 21.0 8.35 7.634861111111109 -1.0 GB 3.8745365787714015 1.3322044077460662 0.95 -20.0 1.139268720 2018-01-25 21:00:00 23.0 21.0 5.8 5.3 3.55 4.0 1.0081529125718112 6 12653024 1.9405721670660565 1.4417920045836903 22.0 6.54209446584196 4.486640644218091 False
26.0 34.0 24.12 24.321888888888875 -1.0 IE 11.299591151509034 3.423915460281055 0.95 -25.0 1.139296062 2018-01-25 15:00:00 110.0 110.0 17.5 15.5 5.3 5.3 1.004864866781696 15 781222 3.2000719722913247 1.8176997877961063 25.292604111 25.129271491913286 19.541971566679 False
16.5 8.6 24.12 24.321888888888875 -1.0 IE 11.299591151509034 3.423915460281055 0.95 -15.5 1.139296062 2018-01-25 15:00:00 110.0 110.0 17.5 15.5 5.3 5.3 1.004864866781696 15 5660962 3.2000719722913247 1.8176997877961063 16.687364853692706 25.129271491913286 19.541971566679 False

The neural network input and output can then be defined as follows, where strategy is either 'lay' or 'back', depicting the payoff of the respective strategy. In other words, we are trying to predict the payoff of taking a bet on a horse, given the values defined under self.X, starting with the LTP t-0, which is the price at which we entered the bet, metrics of the distribution on how the price developed the hour before (min max, media, std skew etc).

self.X = df[['LTP t-0', 'average', 'minimum', 'maximum', 'median', 'std', 'participants', 'skew', 'kurtosis','overrun']].values
self.Y = df[['act', strategy]].values

The loss function

It's important to note that we are not trying to predict which horse wins or which one doesn't win, rather we are trying to see inefficiencies in pricing of the respective probabilities. We already know that the horse with the highest odds (lowest price) is the one most likely to win. But simply backing that horse will not yield any profit (or wil it?). This is what we're trying to find out by directly creating a loss function that calculates the payoff based on the price we enter the bet.

Starting the training

The following run will start training the neural network on a betting strategy:

app.py ts train STRATEGY CLASS COUNTRYCODES [--batchsize=<>] [--from_year=<>] [--to_year=<>] [--localhost]
app.py ts train back FlyingSpider GB,IE --from_year 2015 --to_year 2019 --localhost --batchsize=100

Structure of the code

The code consists of the follwoing packages * Collecting data from betfair and matchbook (below cronjobs need to run)

  • Analyzing the pricing data of horse racing through neural networks and creating a model to make betting recommandations for backing or laying
  • Executing bets according to the trained model
  • Flask web interface that shows logging activity, graphical pnl overview and statistical analysis of past bets

Building the venv

You can create a venv with the environment.yml file as follows:

  • Download anaconda 64 python 3
  • conda create env -f environment.yml -n horse_racing (or simply run update_venv.bat). On Linux you need to use conda env create -f environment.yml
  • The env will be in anaconda/envs/horse_racing

Usage:

The application is controlled over app.py. The pnl overview web server is launched over webserver.py (but better use webserver.wsgi).

Usage:
  app.py ts train STRATEGY CLASS COUNTRYCODES [--batchsize=<>] [--from_year=<>] [--to_year=<>] [--localhost]
  app.py ts backtest STRATEGY CLASS COUNTRYCODES [MODEL_PATH] [--from_year=<>] [--to_year=<>] [--localhost]
  app.py collect_prices
  app.py bet [--armed] [--sandbox_key] [--config=<>]
  app.py update_unfilled_orders [--armed] [--config=<>]
  app.py collect_results
  app.py evaluate_pnl [--overwrite_calculated_pnls] [--config=<>]
  app.py email_summary
  app.py upload_tarball FILE [DESTINATION]
  app.py map_reduce SOURCE DESTINATION CLASS [--localhost] [--use_archive]
  app.py propagate_race_results_to_price_scrape

Exmaple
  app.py ts backtest lay FlyingSpiderBookie GB,IE --from_year 2018 --to_year 2018`
  app.py ts train lay FlyingSpider GB,IE,US,NZ --localhost --from_year 2015 --to_year 2017
  app.py ts backtest lay FlyingSpider GB,IE,US,NZ --localhost --from_year 2016 --to_year 2016
  app.py propagate_race_results_to_price_scrape   adds a winner and losers column to each price in price_scrape
  app.py map_reduce price_scrape price_scrape_enriched DEBookies --localhost
Cronjobs that need to be set up:
    * * * * * sh app.sh collect_prices
    * * * * * sh app.sh bet
    * * * * * sh app.sh update_unfilled_orders --armed
    5 * * * * sh app.sh collect_results
    6 * * * * sh app.sh evaluate_pnl
    7 22 * * * sh app.sh email_summary
    9 4 * * * sh app.sh propagate_race_results_to_price_scrape
    5 5 * * * sh app.sh map_reduce price_scrape price_scrape_enriched_bookies DEBookies

Database

All data is collected onto a mongodb server which can be set up over config.ini.

Example plots of neural network training:

doc/chart_cumulative_year.png?raw=True

doc/chart_cumulative.png?raw=True

doc/chart.png?raw=True

horse_racing/neural_networks/plots/backtesting-20180323-222855.png?raw=True

horse_racing/neural_networks/plots/training-20180318-164301.png?raw=True

horse_racing/neural_networks/plots/training-20180318-184839.png?raw=True

horse_racing/neural_networks/plots/training-20180321-003043.png?raw=True

horse_racing/neural_networks/plots/training-20180426-150117.png?raw=True

horse_racing/neural_networks/plots/training-20180508-224228.png?raw=True

horse_racing/neural_networks/plots/training-20180508-224741.png?raw=True

About

A fully functional automated horse racing trading system for betfair in python. Collects data, uses a neural network to analyse the data, makes trading recommendations and places the bets before races start.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published