diff --git a/notebooks/3-computing-and-storing-features.ipynb b/notebooks/3-computing-and-storing-features.ipynb
index 60d136c..a4e9a33 100644
--- a/notebooks/3-computing-and-storing-features.ipynb
+++ b/notebooks/3-computing-and-storing-features.ipynb
@@ -4,168 +4,964 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "# Features\n",
- "This notebook generates features and labels (goal/no goal) for all shots and stores them in a HDF file. This is a good practice to save computational time if you want to experiment with multiple pipelines."
+ "# Features and labels\n",
+ "This notebook generates features (shot distance, angle, etc.) and labels (goal/no goal) for all shots and stores them in a HDF file. Storing intermediate data is a good practice to save computational time if you want to experiment with multiple pipelines."
]
},
{
"cell_type": "code",
"execution_count": 1,
- "metadata": {},
+ "metadata": {
+ "tags": []
+ },
"outputs": [],
"source": [
+ "from pathlib import Path\n",
+ "\n",
+ "import numpy as np\n",
"import pandas as pd\n",
"pd.set_option('display.max_columns', None)\n",
- "import numpy as np\n",
- "import itertools"
+ "\n",
+ "from socceraction import spadl\n",
+ "from socceraction import vaep"
]
},
{
"cell_type": "code",
"execution_count": 2,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/cw/dtaijupiter/NoCsBack/dtai/pieterr/Projects/soccer_xg/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+ " from .autonotebook import tqdm as notebook_tqdm\n"
+ ]
+ }
+ ],
+ "source": [
+ "%load_ext autoreload\n",
+ "%autoreload 2\n",
+ " \n",
+ "from soccer_xg.data import HDFDataset\n",
+ "import soccer_xg.attributes as fs\n",
+ "import soccer_xg.xg as xg"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Configuration"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "# see 1-load-and-convert-statsbomb-data\n",
+ "data_fp = Path(\"../data\")\n",
+ "dataset = HDFDataset(data_fp / \"spadl-statsbomb-bigfive-1516.h5\", mode='a')"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Feature and label generators\n",
+ "\n",
+ "By default, all features defined in `soccer_xg.attributes.default_features` are computed. It is also possible to compute a subset of these features or add additional feature generators. Each feature generator is a function that expects either a DataFrame object containing SPADL actions, a list of DataFrame objects containing consecutive SPADL actions (i.e., game states) or the raw provider-specific events. Let's take some data and look at some of these feature generators."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "game = dataset.games().loc[3890561]\n",
+ "actions = spadl.utils.add_names(dataset.actions(game_id=3890561))\n",
+ "events = dataset.events(game_id=3890561)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "### Action-based features\n",
+ "\n",
+ "Feature generators which calculate a set of features based on the shot and all preceding actions. The input is a Pandas DataFrame of actions in SPADL format and a boolean mask to select the shots for which features should be computed."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " dist_shot | \n",
+ "
\n",
+ " \n",
+ " action_id | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 151 | \n",
+ " 12.881039 | \n",
+ "
\n",
+ " \n",
+ " 207 | \n",
+ " 8.294462 | \n",
+ "
\n",
+ " \n",
+ " 240 | \n",
+ " 9.495718 | \n",
+ "
\n",
+ " \n",
+ " 359 | \n",
+ " 19.156990 | \n",
+ "
\n",
+ " \n",
+ " 430 | \n",
+ " 14.870452 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " dist_shot\n",
+ "action_id \n",
+ "151 12.881039\n",
+ "207 8.294462\n",
+ "240 9.495718\n",
+ "359 19.156990\n",
+ "430 14.870452"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# convert action to Left-to-Right orientation\n",
+ "ltr_actions = spadl.utils.play_left_to_right(actions, game.home_team_id)\n",
+ "# get actions corresponding to shots\n",
+ "shot_mask = (\n",
+ " actions.type_name.isin([\"shot\", \"shot_penalty\", \"shot_freekick\"]) \n",
+ " & ~actions.result_name.isin([\"owngoal\", \"offside\"])\n",
+ ")\n",
+ "# compute feature\n",
+ "fs.shot_dist(ltr_actions, shot_mask).head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "### Gamestate-based features\n",
+ "\n",
+ "Feature generators which calculate a set of features based on the shot and the N previous actions (i.e., shot context). The input is a list of gamestates. Internally each game state is represented as a list of SPADL action dataframes `[a_0, a_1, ...]` where each row in the `a_i` dataframe contains the previous action of the action in the same row in the `a_{i-1}` dataframe. `a_0` is the shot action."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " speedx_a01 | \n",
+ " speedy_a01 | \n",
+ " speed_a01 | \n",
+ " speedx_a02 | \n",
+ " speedy_a02 | \n",
+ " speed_a02 | \n",
+ "
\n",
+ " \n",
+ " action_id | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 151 | \n",
+ " 0.00000 | \n",
+ " 0.00000 | \n",
+ " 0.000000 | \n",
+ " 3.510238 | \n",
+ " 10.442959 | \n",
+ " 11.017130 | \n",
+ "
\n",
+ " \n",
+ " 207 | \n",
+ " 0.00000 | \n",
+ " 0.00000 | \n",
+ " 0.000000 | \n",
+ " 4.817636 | \n",
+ " 5.487703 | \n",
+ " 7.302362 | \n",
+ "
\n",
+ " \n",
+ " 240 | \n",
+ " 0.76087 | \n",
+ " 5.26087 | \n",
+ " 5.315606 | \n",
+ " 0.498826 | \n",
+ " 3.449027 | \n",
+ " 3.484913 | \n",
+ "
\n",
+ " \n",
+ " 359 | \n",
+ " 0.00000 | \n",
+ " 0.00000 | \n",
+ " 0.000000 | \n",
+ " 6.564303 | \n",
+ " 0.420445 | \n",
+ " 6.577754 | \n",
+ "
\n",
+ " \n",
+ " 430 | \n",
+ " 0.00000 | \n",
+ " 0.00000 | \n",
+ " 0.000000 | \n",
+ " 1.135184 | \n",
+ " 1.323300 | \n",
+ " 1.743493 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " speedx_a01 speedy_a01 speed_a01 speedx_a02 speedy_a02 \\\n",
+ "action_id \n",
+ "151 0.00000 0.00000 0.000000 3.510238 10.442959 \n",
+ "207 0.00000 0.00000 0.000000 4.817636 5.487703 \n",
+ "240 0.76087 5.26087 5.315606 0.498826 3.449027 \n",
+ "359 0.00000 0.00000 0.000000 6.564303 0.420445 \n",
+ "430 0.00000 0.00000 0.000000 1.135184 1.323300 \n",
+ "\n",
+ " speed_a02 \n",
+ "action_id \n",
+ "151 11.017130 \n",
+ "207 7.302362 \n",
+ "240 3.484913 \n",
+ "359 6.577754 \n",
+ "430 1.743493 "
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# convert actions to Left-to-Right gamestates\n",
+ "gamestates = vaep.features.gamestates(actions, nb_prev_actions=3)\n",
+ "ltr_gamestates = vaep.features.play_left_to_right(gamestates, game.home_team_id)\n",
+ "# get gamestates corresponding to shots\n",
+ "shot_gamestates = [states.loc[shot_mask] for states in ltr_gamestates]\n",
+ "# compute feature\n",
+ "fs.speed(shot_gamestates).head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Event-based features\n",
+ "\n",
+ "Feature generators which calculate a set of features based on the original event data. These feature generators are provider-specific. The input is a pandas DataFrame of events and a series with event IDs to select the shots for which features should be computed."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " goalkeeper_x | \n",
+ " goalkeeper_y | \n",
+ " goalkeeper_dist_to_ball | \n",
+ " goalkeeper_dist_to_goal | \n",
+ " goalkeeper_angle_to_goal | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " ba46e9d6-e828-4599-952c-39c1f7d22659 | \n",
+ " 104.117647 | \n",
+ " 35.549367 | \n",
+ " 11.583400 | \n",
+ " 1.782999 | \n",
+ " -1.053111 | \n",
+ "
\n",
+ " \n",
+ " 85d67225-30fb-47c8-b478-cf568941a164 | \n",
+ " 101.470588 | \n",
+ " 32.708861 | \n",
+ " 4.529539 | \n",
+ " 3.758163 | \n",
+ " 0.350701 | \n",
+ "
\n",
+ " \n",
+ " adac17d3-5e67-4e8c-b482-4bae2f36e06e | \n",
+ " 103.764706 | \n",
+ " 35.118987 | \n",
+ " 8.715584 | \n",
+ " 1.666759 | \n",
+ " -0.736036 | \n",
+ "
\n",
+ " \n",
+ " abffd193-62bc-4c8d-8636-1e3f0f0ebbe5 | \n",
+ " 84.705882 | \n",
+ " 45.792405 | \n",
+ " 4.290909 | \n",
+ " 23.471515 | \n",
+ " -0.526388 | \n",
+ "
\n",
+ " \n",
+ " d9cea903-f92a-40e1-a393-1a849d83f157 | \n",
+ " 103.500000 | \n",
+ " 37.443038 | \n",
+ " 11.711233 | \n",
+ " 3.755597 | \n",
+ " -1.159930 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " goalkeeper_x goalkeeper_y \\\n",
+ "ba46e9d6-e828-4599-952c-39c1f7d22659 104.117647 35.549367 \n",
+ "85d67225-30fb-47c8-b478-cf568941a164 101.470588 32.708861 \n",
+ "adac17d3-5e67-4e8c-b482-4bae2f36e06e 103.764706 35.118987 \n",
+ "abffd193-62bc-4c8d-8636-1e3f0f0ebbe5 84.705882 45.792405 \n",
+ "d9cea903-f92a-40e1-a393-1a849d83f157 103.500000 37.443038 \n",
+ "\n",
+ " goalkeeper_dist_to_ball \\\n",
+ "ba46e9d6-e828-4599-952c-39c1f7d22659 11.583400 \n",
+ "85d67225-30fb-47c8-b478-cf568941a164 4.529539 \n",
+ "adac17d3-5e67-4e8c-b482-4bae2f36e06e 8.715584 \n",
+ "abffd193-62bc-4c8d-8636-1e3f0f0ebbe5 4.290909 \n",
+ "d9cea903-f92a-40e1-a393-1a849d83f157 11.711233 \n",
+ "\n",
+ " goalkeeper_dist_to_goal \\\n",
+ "ba46e9d6-e828-4599-952c-39c1f7d22659 1.782999 \n",
+ "85d67225-30fb-47c8-b478-cf568941a164 3.758163 \n",
+ "adac17d3-5e67-4e8c-b482-4bae2f36e06e 1.666759 \n",
+ "abffd193-62bc-4c8d-8636-1e3f0f0ebbe5 23.471515 \n",
+ "d9cea903-f92a-40e1-a393-1a849d83f157 3.755597 \n",
+ "\n",
+ " goalkeeper_angle_to_goal \n",
+ "ba46e9d6-e828-4599-952c-39c1f7d22659 -1.053111 \n",
+ "85d67225-30fb-47c8-b478-cf568941a164 0.350701 \n",
+ "adac17d3-5e67-4e8c-b482-4bae2f36e06e -0.736036 \n",
+ "abffd193-62bc-4c8d-8636-1e3f0f0ebbe5 -0.526388 \n",
+ "d9cea903-f92a-40e1-a393-1a849d83f157 -1.159930 "
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "shot_events_idx = actions.loc[shot_mask, \"original_event_id\"]\n",
+ "fs.statsbomb_goalkeeper_position(events, shot_events_idx).head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Defining your own feature generator"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " rebound | \n",
+ "
\n",
+ " \n",
+ " action_id | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 151 | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " 207 | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " 240 | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " 359 | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " 430 | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " rebound\n",
+ "action_id \n",
+ "151 False\n",
+ "207 False\n",
+ "240 False\n",
+ "359 False\n",
+ "430 False"
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "@fs.ftype(\"actions\")\n",
+ "def rebound(actions, shot_mask):\n",
+ " \"\"\"Determine whether the shot was a rebound.\n",
+ "\n",
+ " Parameters\n",
+ " ----------\n",
+ " actions : pd.DataFrame\n",
+ " The actions of a game in SPADL format.\n",
+ " shot_mask : pd.Series\n",
+ " A boolean mask to select the shots for which features should be\n",
+ " computed.\n",
+ "\n",
+ " Returns\n",
+ " -------\n",
+ " pd.DataFrame\n",
+ " A dataframe with a column indicating whether the shot was a rebound\n",
+ " ('rebound').\n",
+ " \"\"\"\n",
+ " shot = actions.loc[shot_mask]\n",
+ " a1 = actions.shift(1).loc[shot_mask]\n",
+ " a2 = actions.shift(2).loc[shot_mask]\n",
+ " rebound = (\n",
+ " # the previous action was a shot and less than 5 seconds ago\n",
+ " (a1[\"type_name\"].isin([\"shot\", \"shot_penalty\", \"shot_freekick\"])\n",
+ " & (shot[\"time_seconds\"] - a1[\"time_seconds\"] < 5))\n",
+ " # or there was a shot two actions before, less than 5 seconds ago\n",
+ " | (a2[\"type_name\"].isin([\"shot\", \"shot_penalty\", \"shot_freekick\"])\n",
+ " & (shot[\"time_seconds\"] - a2[\"time_seconds\"] < 5))\n",
+ " )\n",
+ " return pd.DataFrame({\"rebound\": rebound}, index=shot.index)\n",
+ "\n",
+ "rebound(ltr_actions, shot_mask).head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Computing a list of feature generators"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
- "%load_ext autoreload\n",
- "%autoreload 2\n",
- "from soccer_xg.api import DataApi\n",
- "import soccer_xg.xg as xg\n",
- "import soccer_xg.features as fs"
+ "feature_generators = [\n",
+ " fs.shot_dist,\n",
+ " fs.shot_visible_angle,\n",
+ " fs.shot_bodypart,\n",
+ " fs.statsbomb_open_goal,\n",
+ " fs.statsbomb_first_touch,\n",
+ " fs.statsbomb_free_projection,\n",
+ " fs.statsbomb_goalkeeper_position,\n",
+ " fs.statsbomb_defenders_position,\n",
+ " fs.statsbomb_assist,\n",
+ " fs.statsbomb_counterattack,\n",
+ " fs.statsbomb_shot_impact_height\n",
+ "]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " dist_shot | \n",
+ " visible_angle_shot | \n",
+ " bodypart_name_shot | \n",
+ " open_goal | \n",
+ " first_touch | \n",
+ " free_projection_gaps | \n",
+ " free_projection_pct | \n",
+ " goalkeeper_x | \n",
+ " goalkeeper_y | \n",
+ " goalkeeper_dist_to_ball | \n",
+ " goalkeeper_dist_to_goal | \n",
+ " goalkeeper_angle_to_goal | \n",
+ " dist_to_defender | \n",
+ " under_pressure | \n",
+ " nb_defenders_in_shot_line | \n",
+ " nb_defenders_behind_ball | \n",
+ " one_on_one | \n",
+ " end_x_assist | \n",
+ " end_y_assist | \n",
+ " carry_dist | \n",
+ " type_assist | \n",
+ " height_assist | \n",
+ " from_counterattack | \n",
+ " impact_height | \n",
+ " goal | \n",
+ "
\n",
+ " \n",
+ " action_id | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 151 | \n",
+ " 12.881039 | \n",
+ " 0.465099 | \n",
+ " foot | \n",
+ " False | \n",
+ " True | \n",
+ " 2 | \n",
+ " 0.505297 | \n",
+ " 104.117647 | \n",
+ " 35.549367 | \n",
+ " 11.583400 | \n",
+ " 1.782999 | \n",
+ " -1.053111 | \n",
+ " 4.374854 | \n",
+ " False | \n",
+ " 2 | \n",
+ " 4 | \n",
+ " False | \n",
+ " 94.500000 | \n",
+ " 42.005063 | \n",
+ " 0.000000 | \n",
+ " cross | \n",
+ " high | \n",
+ " False | \n",
+ " low | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " 207 | \n",
+ " 8.294462 | \n",
+ " 0.813487 | \n",
+ " foot | \n",
+ " False | \n",
+ " True | \n",
+ " 2 | \n",
+ " 0.609730 | \n",
+ " 101.470588 | \n",
+ " 32.708861 | \n",
+ " 4.529539 | \n",
+ " 3.758163 | \n",
+ " 0.350701 | \n",
+ " 1.367580 | \n",
+ " False | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " True | \n",
+ " 96.970588 | \n",
+ " 32.192405 | \n",
+ " 0.000000 | \n",
+ " standard_pass | \n",
+ " high | \n",
+ " False | \n",
+ " low | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " 240 | \n",
+ " 9.495718 | \n",
+ " 0.177482 | \n",
+ " foot | \n",
+ " False | \n",
+ " True | \n",
+ " 1 | \n",
+ " 0.138353 | \n",
+ " 103.764706 | \n",
+ " 35.118987 | \n",
+ " 8.715584 | \n",
+ " 1.666759 | \n",
+ " -0.736036 | \n",
+ " 1.047878 | \n",
+ " True | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " False | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " False | \n",
+ " low | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " 359 | \n",
+ " 19.156990 | \n",
+ " 0.319270 | \n",
+ " foot | \n",
+ " False | \n",
+ " False | \n",
+ " 1 | \n",
+ " 1.000000 | \n",
+ " 84.705882 | \n",
+ " 45.792405 | \n",
+ " 4.290909 | \n",
+ " 23.471515 | \n",
+ " -0.526388 | \n",
+ " 6.794148 | \n",
+ " False | \n",
+ " 0 | \n",
+ " 2 | \n",
+ " False | \n",
+ " 80.294118 | \n",
+ " 44.070886 | \n",
+ " 8.708532 | \n",
+ " through_ball | \n",
+ " ground | \n",
+ " True | \n",
+ " ground | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " 430 | \n",
+ " 14.870452 | \n",
+ " 0.320055 | \n",
+ " foot | \n",
+ " False | \n",
+ " False | \n",
+ " 2 | \n",
+ " 0.739963 | \n",
+ " 103.500000 | \n",
+ " 37.443038 | \n",
+ " 11.711233 | \n",
+ " 3.755597 | \n",
+ " -1.159930 | \n",
+ " 2.179715 | \n",
+ " False | \n",
+ " 0 | \n",
+ " 3 | \n",
+ " False | \n",
+ " 91.147059 | \n",
+ " 51.215190 | \n",
+ " 6.792372 | \n",
+ " standard_pass | \n",
+ " ground | \n",
+ " False | \n",
+ " ground | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " dist_shot visible_angle_shot bodypart_name_shot open_goal \\\n",
+ "action_id \n",
+ "151 12.881039 0.465099 foot False \n",
+ "207 8.294462 0.813487 foot False \n",
+ "240 9.495718 0.177482 foot False \n",
+ "359 19.156990 0.319270 foot False \n",
+ "430 14.870452 0.320055 foot False \n",
+ "\n",
+ " first_touch free_projection_gaps free_projection_pct \\\n",
+ "action_id \n",
+ "151 True 2 0.505297 \n",
+ "207 True 2 0.609730 \n",
+ "240 True 1 0.138353 \n",
+ "359 False 1 1.000000 \n",
+ "430 False 2 0.739963 \n",
+ "\n",
+ " goalkeeper_x goalkeeper_y goalkeeper_dist_to_ball \\\n",
+ "action_id \n",
+ "151 104.117647 35.549367 11.583400 \n",
+ "207 101.470588 32.708861 4.529539 \n",
+ "240 103.764706 35.118987 8.715584 \n",
+ "359 84.705882 45.792405 4.290909 \n",
+ "430 103.500000 37.443038 11.711233 \n",
+ "\n",
+ " goalkeeper_dist_to_goal goalkeeper_angle_to_goal \\\n",
+ "action_id \n",
+ "151 1.782999 -1.053111 \n",
+ "207 3.758163 0.350701 \n",
+ "240 1.666759 -0.736036 \n",
+ "359 23.471515 -0.526388 \n",
+ "430 3.755597 -1.159930 \n",
+ "\n",
+ " dist_to_defender under_pressure nb_defenders_in_shot_line \\\n",
+ "action_id \n",
+ "151 4.374854 False 2 \n",
+ "207 1.367580 False 0 \n",
+ "240 1.047878 True 1 \n",
+ "359 6.794148 False 0 \n",
+ "430 2.179715 False 0 \n",
+ "\n",
+ " nb_defenders_behind_ball one_on_one end_x_assist end_y_assist \\\n",
+ "action_id \n",
+ "151 4 False 94.500000 42.005063 \n",
+ "207 0 True 96.970588 32.192405 \n",
+ "240 1 False NaN NaN \n",
+ "359 2 False 80.294118 44.070886 \n",
+ "430 3 False 91.147059 51.215190 \n",
+ "\n",
+ " carry_dist type_assist height_assist from_counterattack \\\n",
+ "action_id \n",
+ "151 0.000000 cross high False \n",
+ "207 0.000000 standard_pass high False \n",
+ "240 NaN NaN NaN False \n",
+ "359 8.708532 through_ball ground True \n",
+ "430 6.792372 standard_pass ground False \n",
+ "\n",
+ " impact_height goal \n",
+ "action_id \n",
+ "151 low False \n",
+ "207 low True \n",
+ "240 low False \n",
+ "359 ground True \n",
+ "430 ground False "
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df_features, df_labels = fs.compute_attributes(\n",
+ " game=dataset.games().loc[3890561], \n",
+ " actions=dataset.actions(game_id=3890561), \n",
+ " events=dataset.events(game_id=3890561), \n",
+ " xfns=feature_generators,\n",
+ " yfns=[fs.goal_from_shot]\n",
+ ")\n",
+ "pd.concat([df_features, df_labels], axis=1).head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Config"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "metadata": {},
- "outputs": [],
- "source": [
- "# dataset\n",
- "dir_data = \"../data\"\n",
- "provider = 'wyscout_opensource'\n",
- "leagues = ['ENG', 'ESP', 'ITA', 'GER', 'FRA']\n",
- "seasons = ['1718']\n",
+ "## Compute features and labels\n",
"\n",
- "# features\n",
- "store_features = f'../data/{provider}/features.h5'"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "By default, all features defined in `soccer_xg.features.all_features` are computed. It is also possible to compute a subset of these features or add additional feature generators. Each feature generator is a function that expects either a DataFrame object containing actions (i.e., individual actions) or a list of DataFrame objects containing consecutive actions (i.e., game states), and returns the corresponding feature for the individual action or game state. Features that contain information about the shot's outcome are automatically removed."
+ "We can easily compute all and features and labels for an entire dataset."
]
},
{
"cell_type": "code",
- "execution_count": 4,
- "metadata": {},
- "outputs": [],
- "source": [
- "feature_generators = fs.all_features"
- ]
- },
- {
- "cell_type": "markdown",
+ "execution_count": 11,
"metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "Preparing dataset: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1823/1823 [15:09<00:00, 2.00it/s]\n"
+ ]
+ }
+ ],
"source": [
- "## Compute features and labels"
+ "X, y = xg.prepare(dataset, xfns=feature_generators)"
]
},
{
"cell_type": "code",
- "execution_count": 5,
+ "execution_count": 12,
"metadata": {},
"outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "ENG 1718\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "Generating features: 100%|██████████| 380/380 [01:40<00:00, 3.78it/s]\n",
- "Generating labels: 100%|██████████| 380/380 [00:08<00:00, 43.51it/s]\n"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "ESP 1718\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "Generating features: 100%|██████████| 380/380 [01:40<00:00, 3.79it/s]\n",
- "Generating labels: 100%|██████████| 380/380 [00:08<00:00, 43.70it/s]\n"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "ITA 1718\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "Generating features: 100%|██████████| 380/380 [01:40<00:00, 3.77it/s]\n",
- "Generating labels: 100%|██████████| 380/380 [00:08<00:00, 43.62it/s]\n"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "GER 1718\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "Generating features: 100%|██████████| 306/306 [01:21<00:00, 3.77it/s]\n",
- "Generating labels: 100%|██████████| 306/306 [00:06<00:00, 43.93it/s]\n"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "FRA 1718\n"
- ]
- },
{
"name": "stderr",
"output_type": "stream",
"text": [
- "Generating features: 100%|██████████| 380/380 [01:40<00:00, 3.78it/s]\n",
- "Generating labels: 100%|██████████| 380/380 [00:08<00:00, 43.76it/s]\n"
+ "/tmp/ipykernel_489608/998226324.py:2: PerformanceWarning: \n",
+ "your performance may suffer as PyTables will pickle object types that it cannot\n",
+ "map directly to c-types [inferred_type->mixed,key->block2_values] [items->Index(['bodypart_name_shot', 'type_assist', 'height_assist', 'impact_height'], dtype='object')]\n",
+ "\n",
+ " dataset[\"xg/features\"] = X.astype({c: 'object' for c in X.select_dtypes(include='category').columns})\n"
]
}
],
"source": [
- "for (l,s) in itertools.product(leagues, seasons):\n",
- " print(l, s)\n",
- " api = DataApi([f\"{dir_data}/{provider}/spadl-{provider}-{l}-{s}.h5\"])\n",
- " xg.get_features(api, xfns=feature_generators).to_hdf(store_features, key=f'{l}/{s}/features', format='table') \n",
- " xg.get_labels(api).to_hdf(store_features, key=f'{l}/{s}/labels', format='table') "
+ "# we cannot store a categorical dtype in a HDF file\n",
+ "dataset[\"xg/features\"] = X.astype({c: 'object' for c in X.select_dtypes(include='category').columns})\n",
+ "dataset[\"xg/labels\"] = y.astype({c: 'object' for c in y.select_dtypes(include='category').columns})"
]
},
{
@@ -177,7 +973,7 @@
},
{
"cell_type": "code",
- "execution_count": 6,
+ "execution_count": 13,
"metadata": {},
"outputs": [
{
@@ -202,65 +998,30 @@
" \n",
" | \n",
" | \n",
- " type_id_a0 | \n",
- " type_id_a1 | \n",
- " type_id_a2 | \n",
- " bodypart_id_a0 | \n",
- " bodypart_id_a1 | \n",
- " bodypart_id_a2 | \n",
- " result_id_a1 | \n",
- " result_id_a2 | \n",
- " start_x_a0 | \n",
- " start_y_a0 | \n",
- " start_x_a1 | \n",
- " start_y_a1 | \n",
- " start_x_a2 | \n",
- " start_y_a2 | \n",
- " end_x_a1 | \n",
- " end_y_a1 | \n",
- " end_x_a2 | \n",
- " end_y_a2 | \n",
- " dx_a1 | \n",
- " dy_a1 | \n",
- " movement_a1 | \n",
- " dx_a2 | \n",
- " dy_a2 | \n",
- " movement_a2 | \n",
- " dx_a01 | \n",
- " dy_a01 | \n",
- " mov_a01 | \n",
- " dx_a02 | \n",
- " dy_a02 | \n",
- " mov_a02 | \n",
- " start_dist_to_goal_a0 | \n",
- " start_angle_to_goal_a0 | \n",
- " start_dist_to_goal_a1 | \n",
- " start_angle_to_goal_a1 | \n",
- " start_dist_to_goal_a2 | \n",
- " start_angle_to_goal_a2 | \n",
- " end_dist_to_goal_a1 | \n",
- " end_angle_to_goal_a1 | \n",
- " end_dist_to_goal_a2 | \n",
- " end_angle_to_goal_a2 | \n",
- " team_1 | \n",
- " team_2 | \n",
- " time_delta_1 | \n",
- " time_delta_2 | \n",
- " speedx_a01 | \n",
- " speedy_a01 | \n",
- " speed_a01 | \n",
- " speedx_a02 | \n",
- " speedy_a02 | \n",
- " speed_a02 | \n",
- " shot_angle_a0 | \n",
- " shot_angle_a1 | \n",
- " shot_angle_a2 | \n",
- " caley_zone_a0 | \n",
- " caley_zone_a1 | \n",
- " caley_zone_a2 | \n",
- " angle_zone_a0 | \n",
- " angle_zone_a1 | \n",
- " angle_zone_a2 | \n",
+ " dist_shot | \n",
+ " visible_angle_shot | \n",
+ " bodypart_name_shot | \n",
+ " open_goal | \n",
+ " first_touch | \n",
+ " free_projection_gaps | \n",
+ " free_projection_pct | \n",
+ " goalkeeper_x | \n",
+ " goalkeeper_y | \n",
+ " goalkeeper_dist_to_ball | \n",
+ " goalkeeper_dist_to_goal | \n",
+ " goalkeeper_angle_to_goal | \n",
+ " dist_to_defender | \n",
+ " under_pressure | \n",
+ " nb_defenders_in_shot_line | \n",
+ " nb_defenders_behind_ball | \n",
+ " one_on_one | \n",
+ " end_x_assist | \n",
+ " end_y_assist | \n",
+ " carry_dist | \n",
+ " type_assist | \n",
+ " height_assist | \n",
+ " from_counterattack | \n",
+ " impact_height | \n",
"
\n",
" \n",
" game_id | \n",
@@ -289,487 +1050,221 @@
" | \n",
" | \n",
" | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
- " | \n",
"
\n",
" \n",
" \n",
" \n",
- " 2500098 | \n",
- " 17 | \n",
- " shot | \n",
- " dribble | \n",
- " cross | \n",
- " foot | \n",
- " foot | \n",
+ " 3890561 | \n",
+ " 151 | \n",
+ " 12.881039 | \n",
+ " 0.465099 | \n",
" foot | \n",
- " success | \n",
- " success | \n",
- " 99.75 | \n",
- " 26.52 | \n",
- " 91.35 | \n",
- " 29.92 | \n",
- " 97.65 | \n",
- " 6.12 | \n",
- " 99.75 | \n",
- " 26.52 | \n",
- " 91.35 | \n",
- " 29.92 | \n",
- " 8.40 | \n",
- " -3.40 | \n",
- " 9.062009 | \n",
- " -6.30 | \n",
- " 23.80 | \n",
- " 24.619708 | \n",
- " 0.0 | \n",
- " 0.0 | \n",
- " 0.0 | \n",
- " -8.40 | \n",
- " 3.40 | \n",
- " 9.062009 | \n",
- " 9.138539 | \n",
- " 0.958815 | \n",
- " 14.246715 | \n",
- " 0.290448 | \n",
- " 28.832567 | \n",
- " 1.313031 | \n",
- " 9.138539 | \n",
- " 0.958815 | \n",
- " 14.246715 | \n",
- " 0.290448 | \n",
- " True | \n",
+ " False | \n",
" True | \n",
- " 3.433228 | \n",
- " 6.866456 | \n",
- " 0.0 | \n",
- " 0.0 | \n",
- " 0.0 | \n",
- " 1.223339 | \n",
- " 0.495161 | \n",
- " 1.319750 | \n",
- " 0.499778 | \n",
- " 0.483780 | \n",
- " 0.065500 | \n",
- " 2 | \n",
- " 3 | \n",
- " 8 | \n",
- " 9 | \n",
- " 12 | \n",
- " 18 | \n",
+ " 2.0 | \n",
+ " 0.505297 | \n",
+ " 104.117647 | \n",
+ " 35.549367 | \n",
+ " 11.583400 | \n",
+ " 1.782999 | \n",
+ " -1.053111 | \n",
+ " 4.374854 | \n",
+ " False | \n",
+ " 2.0 | \n",
+ " 4.0 | \n",
+ " False | \n",
+ " 94.500000 | \n",
+ " 42.005063 | \n",
+ " 0.000000 | \n",
+ " cross | \n",
+ " high | \n",
+ " False | \n",
+ " low | \n",
"
\n",
" \n",
- " 40 | \n",
- " shot | \n",
- " corner_crossed | \n",
- " pass | \n",
- " foot | \n",
+ " 207 | \n",
+ " 8.294462 | \n",
+ " 0.813487 | \n",
" foot | \n",
- " foot | \n",
- " success | \n",
- " fail | \n",
- " 91.35 | \n",
- " 35.36 | \n",
- " 105.00 | \n",
- " 0.00 | \n",
- " 96.60 | \n",
- " 23.80 | \n",
- " 91.35 | \n",
- " 35.36 | \n",
- " 0.00 | \n",
- " 53.72 | \n",
- " -13.65 | \n",
- " 35.36 | \n",
- " 37.903194 | \n",
- " -96.60 | \n",
- " 29.92 | \n",
- " 101.127476 | \n",
- " 0.0 | \n",
- " 0.0 | \n",
- " 0.0 | \n",
- " -91.35 | \n",
- " 18.36 | \n",
- " 93.176779 | \n",
- " 13.717584 | \n",
- " 0.099306 | \n",
- " 34.000000 | \n",
- " 1.570796 | \n",
- " 13.213629 | \n",
- " 0.881872 | \n",
- " 13.717584 | \n",
- " 0.099306 | \n",
- " 106.835754 | \n",
- " 0.185647 | \n",
- " True | \n",
+ " False | \n",
" True | \n",
- " 2.102531 | \n",
- " 21.927228 | \n",
- " 0.0 | \n",
+ " 2.0 | \n",
+ " 0.609730 | \n",
+ " 101.470588 | \n",
+ " 32.708861 | \n",
+ " 4.529539 | \n",
+ " 3.758163 | \n",
+ " 0.350701 | \n",
+ " 1.367580 | \n",
+ " False | \n",
" 0.0 | \n",
" 0.0 | \n",
- " 4.166053 | \n",
- " 0.837315 | \n",
- " 4.249364 | \n",
- " 0.517985 | \n",
+ " True | \n",
+ " 96.970588 | \n",
+ " 32.192405 | \n",
" 0.000000 | \n",
- " 0.363334 | \n",
- " 3 | \n",
- " 8 | \n",
- " 4 | \n",
- " 12 | \n",
- " 21 | \n",
- " 12 | \n",
+ " standard_pass | \n",
+ " high | \n",
+ " False | \n",
+ " low | \n",
"
\n",
" \n",
- " 77 | \n",
- " shot | \n",
- " clearance | \n",
- " cross | \n",
- " foot | \n",
+ " 240 | \n",
+ " 9.495718 | \n",
+ " 0.177482 | \n",
" foot | \n",
- " foot | \n",
- " fail | \n",
- " fail | \n",
- " 75.60 | \n",
- " 29.92 | \n",
- " 94.50 | \n",
- " 27.20 | \n",
- " 98.70 | \n",
- " 65.96 | \n",
- " 75.60 | \n",
- " 29.92 | \n",
- " 94.50 | \n",
- " 27.20 | \n",
- " -18.90 | \n",
- " 2.72 | \n",
- " 19.094722 | \n",
- " -4.20 | \n",
- " -38.76 | \n",
- " 38.986890 | \n",
- " 0.0 | \n",
- " 0.0 | \n",
- " 0.0 | \n",
- " 18.90 | \n",
- " -2.72 | \n",
- " 19.094722 | \n",
- " 29.681752 | \n",
- " 0.137895 | \n",
- " 12.509596 | \n",
- " 0.574700 | \n",
- " 32.575015 | \n",
- " 1.376170 | \n",
- " 29.681752 | \n",
- " 0.137895 | \n",
- " 12.509596 | \n",
- " 0.574700 | \n",
" False | \n",
" True | \n",
- " 2.629861 | \n",
- " 3.250682 | \n",
- " 0.0 | \n",
- " 0.0 | \n",
- " 0.0 | \n",
- " 5.814165 | \n",
- " 0.836747 | \n",
- " 5.874066 | \n",
- " 0.242481 | \n",
- " 0.491555 | \n",
- " 0.043863 | \n",
- " 6 | \n",
- " 3 | \n",
- " 0 | \n",
- " 18 | \n",
- " 12 | \n",
- " 18 | \n",
+ " 1.0 | \n",
+ " 0.138353 | \n",
+ " 103.764706 | \n",
+ " 35.118987 | \n",
+ " 8.715584 | \n",
+ " 1.666759 | \n",
+ " -0.736036 | \n",
+ " 1.047878 | \n",
+ " True | \n",
+ " 1.0 | \n",
+ " 1.0 | \n",
+ " False | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " False | \n",
+ " low | \n",
"
\n",
" \n",
- " 140 | \n",
- " shot | \n",
- " cross | \n",
- " dribble | \n",
- " foot | \n",
+ " 359 | \n",
+ " 19.156990 | \n",
+ " 0.319270 | \n",
" foot | \n",
- " foot | \n",
- " success | \n",
- " success | \n",
- " 92.40 | \n",
- " 43.52 | \n",
- " 98.70 | \n",
- " 51.68 | \n",
- " 91.35 | \n",
- " 54.40 | \n",
- " 92.40 | \n",
- " 43.52 | \n",
- " 98.70 | \n",
- " 51.68 | \n",
- " -6.30 | \n",
- " -8.16 | \n",
- " 10.309006 | \n",
- " 7.35 | \n",
- " -2.72 | \n",
- " 7.837149 | \n",
- " 0.0 | \n",
- " 0.0 | \n",
+ " False | \n",
+ " False | \n",
+ " 1.0 | \n",
+ " 1.000000 | \n",
+ " 84.705882 | \n",
+ " 45.792405 | \n",
+ " 4.290909 | \n",
+ " 23.471515 | \n",
+ " -0.526388 | \n",
+ " 6.794148 | \n",
+ " False | \n",
" 0.0 | \n",
- " 6.30 | \n",
- " 8.16 | \n",
- " 10.309006 | \n",
- " 15.792099 | \n",
- " 0.647047 | \n",
- " 18.768921 | \n",
- " 1.228489 | \n",
- " 24.545519 | \n",
- " 0.981099 | \n",
- " 15.792099 | \n",
- " 0.647047 | \n",
- " 18.768921 | \n",
- " 1.228489 | \n",
- " True | \n",
+ " 2.0 | \n",
+ " False | \n",
+ " 80.294118 | \n",
+ " 44.070886 | \n",
+ " 8.708532 | \n",
+ " through_ball | \n",
+ " ground | \n",
" True | \n",
- " 1.052499 | \n",
- " 5.000627 | \n",
- " 0.0 | \n",
- " 0.0 | \n",
- " 0.0 | \n",
- " 1.259842 | \n",
- " 1.631795 | \n",
- " 2.061543 | \n",
- " 0.371538 | \n",
- " 0.134860 | \n",
- " 0.167545 | \n",
- " 4 | \n",
- " 5 | \n",
- " 0 | \n",
- " 12 | \n",
- " 15 | \n",
- " 18 | \n",
+ " ground | \n",
"
\n",
" \n",
- " 145 | \n",
- " shot | \n",
- " pass | \n",
- " pass | \n",
- " foot | \n",
+ " 430 | \n",
+ " 14.870452 | \n",
+ " 0.320055 | \n",
" foot | \n",
- " foot | \n",
- " success | \n",
- " success | \n",
- " 99.75 | \n",
- " 37.40 | \n",
- " 96.60 | \n",
- " 38.76 | \n",
- " 93.45 | \n",
- " 45.56 | \n",
- " 99.75 | \n",
- " 37.40 | \n",
- " 96.60 | \n",
- " 38.76 | \n",
- " 3.15 | \n",
- " -1.36 | \n",
- " 3.431049 | \n",
- " 3.15 | \n",
- " -6.80 | \n",
- " 7.494164 | \n",
- " 0.0 | \n",
- " 0.0 | \n",
- " 0.0 | \n",
- " -3.15 | \n",
- " 1.36 | \n",
- " 3.431049 | \n",
- " 6.254798 | \n",
- " 0.574700 | \n",
- " 9.654926 | \n",
- " 0.515549 | \n",
- " 16.341239 | \n",
- " 0.785831 | \n",
- " 6.254798 | \n",
- " 0.574700 | \n",
- " 9.654926 | \n",
- " 0.515549 | \n",
- " True | \n",
- " True | \n",
- " 1.677755 | \n",
- " 2.659997 | \n",
- " 0.0 | \n",
- " 0.0 | \n",
+ " False | \n",
+ " False | \n",
+ " 2.0 | \n",
+ " 0.739963 | \n",
+ " 103.500000 | \n",
+ " 37.443038 | \n",
+ " 11.711233 | \n",
+ " 3.755597 | \n",
+ " -1.159930 | \n",
+ " 2.179715 | \n",
+ " False | \n",
" 0.0 | \n",
- " 1.184212 | \n",
- " 0.511279 | \n",
- " 1.289870 | \n",
- " 0.978291 | \n",
- " 0.654611 | \n",
- " 0.320841 | \n",
- " 1 | \n",
- " 3 | \n",
- " 4 | \n",
- " 6 | \n",
- " 9 | \n",
- " 15 | \n",
+ " 3.0 | \n",
+ " False | \n",
+ " 91.147059 | \n",
+ " 51.215190 | \n",
+ " 6.792372 | \n",
+ " standard_pass | \n",
+ " ground | \n",
+ " False | \n",
+ " ground | \n",
"
\n",
" \n",
"\n",
""
],
"text/plain": [
- " type_id_a0 type_id_a1 type_id_a2 bodypart_id_a0 \\\n",
- "game_id action_id \n",
- "2500098 17 shot dribble cross foot \n",
- " 40 shot corner_crossed pass foot \n",
- " 77 shot clearance cross foot \n",
- " 140 shot cross dribble foot \n",
- " 145 shot pass pass foot \n",
- "\n",
- " bodypart_id_a1 bodypart_id_a2 result_id_a1 result_id_a2 \\\n",
- "game_id action_id \n",
- "2500098 17 foot foot success success \n",
- " 40 foot foot success fail \n",
- " 77 foot foot fail fail \n",
- " 140 foot foot success success \n",
- " 145 foot foot success success \n",
- "\n",
- " start_x_a0 start_y_a0 start_x_a1 start_y_a1 start_x_a2 \\\n",
+ " dist_shot visible_angle_shot bodypart_name_shot \\\n",
+ "game_id action_id \n",
+ "3890561 151 12.881039 0.465099 foot \n",
+ " 207 8.294462 0.813487 foot \n",
+ " 240 9.495718 0.177482 foot \n",
+ " 359 19.156990 0.319270 foot \n",
+ " 430 14.870452 0.320055 foot \n",
+ "\n",
+ " open_goal first_touch free_projection_gaps \\\n",
+ "game_id action_id \n",
+ "3890561 151 False True 2.0 \n",
+ " 207 False True 2.0 \n",
+ " 240 False True 1.0 \n",
+ " 359 False False 1.0 \n",
+ " 430 False False 2.0 \n",
+ "\n",
+ " free_projection_pct goalkeeper_x goalkeeper_y \\\n",
+ "game_id action_id \n",
+ "3890561 151 0.505297 104.117647 35.549367 \n",
+ " 207 0.609730 101.470588 32.708861 \n",
+ " 240 0.138353 103.764706 35.118987 \n",
+ " 359 1.000000 84.705882 45.792405 \n",
+ " 430 0.739963 103.500000 37.443038 \n",
+ "\n",
+ " goalkeeper_dist_to_ball goalkeeper_dist_to_goal \\\n",
+ "game_id action_id \n",
+ "3890561 151 11.583400 1.782999 \n",
+ " 207 4.529539 3.758163 \n",
+ " 240 8.715584 1.666759 \n",
+ " 359 4.290909 23.471515 \n",
+ " 430 11.711233 3.755597 \n",
+ "\n",
+ " goalkeeper_angle_to_goal dist_to_defender under_pressure \\\n",
"game_id action_id \n",
- "2500098 17 99.75 26.52 91.35 29.92 97.65 \n",
- " 40 91.35 35.36 105.00 0.00 96.60 \n",
- " 77 75.60 29.92 94.50 27.20 98.70 \n",
- " 140 92.40 43.52 98.70 51.68 91.35 \n",
- " 145 99.75 37.40 96.60 38.76 93.45 \n",
- "\n",
- " start_y_a2 end_x_a1 end_y_a1 end_x_a2 end_y_a2 dx_a1 \\\n",
- "game_id action_id \n",
- "2500098 17 6.12 99.75 26.52 91.35 29.92 8.40 \n",
- " 40 23.80 91.35 35.36 0.00 53.72 -13.65 \n",
- " 77 65.96 75.60 29.92 94.50 27.20 -18.90 \n",
- " 140 54.40 92.40 43.52 98.70 51.68 -6.30 \n",
- " 145 45.56 99.75 37.40 96.60 38.76 3.15 \n",
- "\n",
- " dy_a1 movement_a1 dx_a2 dy_a2 movement_a2 dx_a01 \\\n",
- "game_id action_id \n",
- "2500098 17 -3.40 9.062009 -6.30 23.80 24.619708 0.0 \n",
- " 40 35.36 37.903194 -96.60 29.92 101.127476 0.0 \n",
- " 77 2.72 19.094722 -4.20 -38.76 38.986890 0.0 \n",
- " 140 -8.16 10.309006 7.35 -2.72 7.837149 0.0 \n",
- " 145 -1.36 3.431049 3.15 -6.80 7.494164 0.0 \n",
- "\n",
- " dy_a01 mov_a01 dx_a02 dy_a02 mov_a02 \\\n",
- "game_id action_id \n",
- "2500098 17 0.0 0.0 -8.40 3.40 9.062009 \n",
- " 40 0.0 0.0 -91.35 18.36 93.176779 \n",
- " 77 0.0 0.0 18.90 -2.72 19.094722 \n",
- " 140 0.0 0.0 6.30 8.16 10.309006 \n",
- " 145 0.0 0.0 -3.15 1.36 3.431049 \n",
- "\n",
- " start_dist_to_goal_a0 start_angle_to_goal_a0 \\\n",
- "game_id action_id \n",
- "2500098 17 9.138539 0.958815 \n",
- " 40 13.717584 0.099306 \n",
- " 77 29.681752 0.137895 \n",
- " 140 15.792099 0.647047 \n",
- " 145 6.254798 0.574700 \n",
- "\n",
- " start_dist_to_goal_a1 start_angle_to_goal_a1 \\\n",
- "game_id action_id \n",
- "2500098 17 14.246715 0.290448 \n",
- " 40 34.000000 1.570796 \n",
- " 77 12.509596 0.574700 \n",
- " 140 18.768921 1.228489 \n",
- " 145 9.654926 0.515549 \n",
- "\n",
- " start_dist_to_goal_a2 start_angle_to_goal_a2 \\\n",
- "game_id action_id \n",
- "2500098 17 28.832567 1.313031 \n",
- " 40 13.213629 0.881872 \n",
- " 77 32.575015 1.376170 \n",
- " 140 24.545519 0.981099 \n",
- " 145 16.341239 0.785831 \n",
- "\n",
- " end_dist_to_goal_a1 end_angle_to_goal_a1 \\\n",
- "game_id action_id \n",
- "2500098 17 9.138539 0.958815 \n",
- " 40 13.717584 0.099306 \n",
- " 77 29.681752 0.137895 \n",
- " 140 15.792099 0.647047 \n",
- " 145 6.254798 0.574700 \n",
- "\n",
- " end_dist_to_goal_a2 end_angle_to_goal_a2 team_1 team_2 \\\n",
- "game_id action_id \n",
- "2500098 17 14.246715 0.290448 True True \n",
- " 40 106.835754 0.185647 True True \n",
- " 77 12.509596 0.574700 False True \n",
- " 140 18.768921 1.228489 True True \n",
- " 145 9.654926 0.515549 True True \n",
- "\n",
- " time_delta_1 time_delta_2 speedx_a01 speedy_a01 \\\n",
+ "3890561 151 -1.053111 4.374854 False \n",
+ " 207 0.350701 1.367580 False \n",
+ " 240 -0.736036 1.047878 True \n",
+ " 359 -0.526388 6.794148 False \n",
+ " 430 -1.159930 2.179715 False \n",
+ "\n",
+ " nb_defenders_in_shot_line nb_defenders_behind_ball \\\n",
+ "game_id action_id \n",
+ "3890561 151 2.0 4.0 \n",
+ " 207 0.0 0.0 \n",
+ " 240 1.0 1.0 \n",
+ " 359 0.0 2.0 \n",
+ " 430 0.0 3.0 \n",
+ "\n",
+ " one_on_one end_x_assist end_y_assist carry_dist \\\n",
"game_id action_id \n",
- "2500098 17 3.433228 6.866456 0.0 0.0 \n",
- " 40 2.102531 21.927228 0.0 0.0 \n",
- " 77 2.629861 3.250682 0.0 0.0 \n",
- " 140 1.052499 5.000627 0.0 0.0 \n",
- " 145 1.677755 2.659997 0.0 0.0 \n",
+ "3890561 151 False 94.500000 42.005063 0.000000 \n",
+ " 207 True 96.970588 32.192405 0.000000 \n",
+ " 240 False NaN NaN NaN \n",
+ " 359 False 80.294118 44.070886 8.708532 \n",
+ " 430 False 91.147059 51.215190 6.792372 \n",
"\n",
- " speed_a01 speedx_a02 speedy_a02 speed_a02 \\\n",
- "game_id action_id \n",
- "2500098 17 0.0 1.223339 0.495161 1.319750 \n",
- " 40 0.0 4.166053 0.837315 4.249364 \n",
- " 77 0.0 5.814165 0.836747 5.874066 \n",
- " 140 0.0 1.259842 1.631795 2.061543 \n",
- " 145 0.0 1.184212 0.511279 1.289870 \n",
- "\n",
- " shot_angle_a0 shot_angle_a1 shot_angle_a2 caley_zone_a0 \\\n",
- "game_id action_id \n",
- "2500098 17 0.499778 0.483780 0.065500 2 \n",
- " 40 0.517985 0.000000 0.363334 3 \n",
- " 77 0.242481 0.491555 0.043863 6 \n",
- " 140 0.371538 0.134860 0.167545 4 \n",
- " 145 0.978291 0.654611 0.320841 1 \n",
- "\n",
- " caley_zone_a1 caley_zone_a2 angle_zone_a0 angle_zone_a1 \\\n",
- "game_id action_id \n",
- "2500098 17 3 8 9 12 \n",
- " 40 8 4 12 21 \n",
- " 77 3 0 18 12 \n",
- " 140 5 0 12 15 \n",
- " 145 3 4 6 9 \n",
- "\n",
- " angle_zone_a2 \n",
+ " type_assist height_assist from_counterattack \\\n",
+ "game_id action_id \n",
+ "3890561 151 cross high False \n",
+ " 207 standard_pass high False \n",
+ " 240 NaN NaN False \n",
+ " 359 through_ball ground True \n",
+ " 430 standard_pass ground False \n",
+ "\n",
+ " impact_height \n",
"game_id action_id \n",
- "2500098 17 18 \n",
- " 40 12 \n",
- " 77 18 \n",
- " 140 18 \n",
- " 145 15 "
+ "3890561 151 low \n",
+ " 207 low \n",
+ " 240 low \n",
+ " 359 ground \n",
+ " 430 ground "
]
},
"metadata": {},
@@ -807,24 +1302,24 @@
" \n",
" \n",
" \n",
- " 2500098 | \n",
- " 17 | \n",
+ " 3890561 | \n",
+ " 151 | \n",
" False | \n",
"
\n",
" \n",
- " 40 | \n",
- " False | \n",
+ " 207 | \n",
+ " True | \n",
"
\n",
" \n",
- " 77 | \n",
+ " 240 | \n",
" False | \n",
"
\n",
" \n",
- " 140 | \n",
- " False | \n",
+ " 359 | \n",
+ " True | \n",
"
\n",
" \n",
- " 145 | \n",
+ " 430 | \n",
" False | \n",
"
\n",
" \n",
@@ -834,11 +1329,11 @@
"text/plain": [
" goal\n",
"game_id action_id \n",
- "2500098 17 False\n",
- " 40 False\n",
- " 77 False\n",
- " 140 False\n",
- " 145 False"
+ "3890561 151 False\n",
+ " 207 True\n",
+ " 240 False\n",
+ " 359 True\n",
+ " 430 False"
]
},
"metadata": {},
@@ -846,31 +1341,25 @@
}
],
"source": [
- "features = []\n",
- "labels = []\n",
- "for (l,s) in itertools.product(leagues, seasons):\n",
- " features.append(pd.read_hdf(store_features, key=f'{l}/{s}/features'))\n",
- " labels.append(pd.read_hdf(store_features, key=f'{l}/{s}/labels'))\n",
- "features = pd.concat(features)\n",
- "labels = pd.concat(labels)\n",
- "\n",
- "display(features.head())\n",
- "display(labels.to_frame().head())"
+ "display(dataset[\"xg/features\"].head())\n",
+ "display(dataset[\"xg/labels\"].head())"
]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 14,
"metadata": {},
"outputs": [],
- "source": []
+ "source": [
+ "dataset.close()"
+ ]
}
],
"metadata": {
"kernelspec": {
- "display_name": "soccer_dataprovider_comparison",
+ "display_name": "soccer_xg",
"language": "python",
- "name": "soccer_dataprovider_comparison"
+ "name": "soccer_xg"
},
"language_info": {
"codemirror_mode": {
@@ -882,7 +1371,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.6.2"
+ "version": "3.11.1"
},
"toc": {
"base_numbering": 1,
@@ -899,5 +1388,5 @@
}
},
"nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
}