From 1a209179396def3f6674b8f7b24f45c2314b6fc8 Mon Sep 17 00:00:00 2001 From: t-gummer Date: Mon, 28 Aug 2023 14:14:10 +1000 Subject: [PATCH 1/2] Fix typos --- .../index/execute-results/html.json | 4 +- .../over-analysing-idle-footy-chat/index.html | 94 +++++++++---------- _site_rendered/blog/index.html | 2 +- _site_rendered/search.json | 6 +- _site_rendered/sitemap.xml | 4 +- .../over-analysing-idle-footy-chat/index.qmd | 6 +- 6 files changed, 58 insertions(+), 58 deletions(-) diff --git a/_freeze/blog/2023/over-analysing-idle-footy-chat/index/execute-results/html.json b/_freeze/blog/2023/over-analysing-idle-footy-chat/index/execute-results/html.json index 8105f06..b92975a 100644 --- a/_freeze/blog/2023/over-analysing-idle-footy-chat/index/execute-results/html.json +++ b/_freeze/blog/2023/over-analysing-idle-footy-chat/index/execute-results/html.json @@ -1,7 +1,7 @@ { - "hash": "0812fb57def31802d6713d7c6202e601", + "hash": "6d6a223706567cfe758348d3f6a49a2c", "result": { - "markdown": "---\ntitle: \"(Over-)Analysing Idle Footy Chat\"\ndescription: |\n This blog post discusses the types of questions one often posits while watching the footy (or indeed any sport). I will use this article as a medium through which I can introduce analysis of AFL data in R.\ndate: 2023-08-03\ncategories: \n - Sport\n - AFL\n - R\n - Data\nimage: fitzroy-lions-logo.png\nreference-location: margin\ndraft: false\nfreeze: true\nreading-timereading-time: true\n---\n\n::: {.cell}\n\n:::\n\n\n\n::: {.callout-caution}\n\n\n## Apologies for the delay\n\nI have been promising some people that this (my first) post will be \"coming soon\" for quite some time now. It has finally arrived and the main reasons for my slowness are:\n\n* I am a slow writer. I often have all the ideas in my head, but when it comes to putting them into nice, publishable words, it has a tendency to become a bit of a slog.\n\n* The scope of this project was overly ambitious and expanded as I kept on branching off from the main mission in order provide additional background (when given free reign, this is my tendency).\n\n\nThe R code and analysis itself did not take too long, so the main areas I am looking to improve in the future are my writing efficiency and keeping the scope of my blog posts under control. However, it would feel like going against my style to avoid going down rabbit holes and off on tangents entirely, so there is still a balance to be struck there.\n\nThe out-of-control scope has also impacted the length of this article. I commend anyone who actually reads the whole thing, but hopefully it's also possible to jump around and only read the parts that interest you in particular using the table of contents. I will endeavour to keep future blog posts a lot shorter going forward.\n\n:::\n\n\n::: {.callout-note}\n\n## Edits since the initial release\n\n> Thanks to everyone who has provided feedback and suggestions.\n\n* Fixed some grammatical, punctuation and spelling errors\n\n* Re-worded the [question concerning goal kickers](#most-goal-kickers) to include the words \"for a team in a single game\" instead of merely \"in a single game\". I decided to change the question to suit the answer, instead of the reverse (this is because changing the answers would mean materially modifying the content of the article, which I didn't want to do given the vast majority of the people who would ever read it, have done so already)\n\n:::\n\n# Prelude\n\nSince antiquity (or at the very least living memory), sporting data has been recorded, published and analysed for almost every professional sport known to man. In the modern day of analytics and social media it has reached a point where sports statistics are constantly being recorded and opined on by teams, the press and your average punter alike. \n\nMy beloved sport of *Australian Rules Football* (which I will henceforth refer to as \"footy\"[^footy]) is no different. Indeed some have even described the AFL (Australia's national competition) as \"the most data rich sport on Earth\"[^data-rich-sport-source], although I would suggest that certain American sports such as baseball (i.e. [Moneyball](https://www.youtube.com/watch?v=PlKDQqKh03Y){target=\"_blank\"}) have made far better use of their data.\n\n[^footy]: As a Western Australian this is how I define it (along with the majority of Australia) but I am aware that this term is reserved for rugby league in New South Wales and Queensland\n[^data-rich-sport-source]: [Source](https://www.youtube.com/watch?v=i_mePwh_02M){target=\"_blank\"}\n\n\nIt is my observation that it is common for those watching or attending live footy to ask questions such as:\n\n> What is the record for the most *[statistical category]*?\n\nor \n\n> When was the last time *[obscure event]* happened?\n\nI believe a contributor to this is that we are emulating what we hear on the broadcast commentary. The difference is that they often have a team of [computer-type boffins](https://www.youtube.com/clip/UgkxvNk03iqKigc9NKMjWOs2NvuRFRY8xRHn){target=\"_blank\"}[^video-link-disclaimer] behind the scenes feeding them the answer. \n\n[^video-link-disclaimer]: The video linked here is unfortunately clipped from an American guy who kind of missed the point a bit, but it is also the highest quality clip I could find of this brilliant piece of commentary from BT ([Brian Taylor](https://www.youtube.com/watch?v=E_JCdK4ah78){target=\"_blank\"} for the uninitiated).\n\n> But what are us plebeians meant to do, bereft of such luxuries?\n\nyou might ask...\n\nWell, a quick Google search will make short work of questions of a more trivial nature such as \"which player has kicked the most career goals\"[^tony-lockett] or \"which team has won the most premierships\"[^most-premierships]. The more savvy among us may find answers to slightly more edifying questions by performing pro gamer moves such as\n\n* Trawling through more obscure websites such as [AFL Tables](https://afltables.com/afl/afl_index.html){target=\"_blank\"}[^baseball-reference] to answer things like \"What is the most disposals Zac Dawson had in a game\"[^zac-dawson]; or\n\n* Digging into the deep recesses of the AFL Live App to answer questions like \"What is the record for the longest distance run in a game\"[^telstra-tracker]\n\nEven so, some more sophisticated questions will still go unanswered.\n\n[^baseball-reference]: the [baseball/basketball reference](https://www.sports-reference.com/){target=\"_blank\"} of the footy world (but maybe not quite as extensive)\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n\n\n```\n:::\n:::\n\n\n\n[^zac-dawson]: [19](https://afltables.com/afl/stats/players/Z/Zac_Dawson.html#sortableTable2){target=\"_blank\"}, [versus Melbourne in round 10 2009](https://afltables.com/afl/stats/games/2009/111520090530.html){target=\"_blank\"}\n\n[^telstra-tracker]: [Tom Scully, 18.9 KM]{id='distance-image'}\n\n[^tony-lockett]: [Tony Lockett on 1360](https://en.wikipedia.org/wiki/List_of_VFL/AFL_records#Goalkicking){target=\"_blank\"}\n\n[^most-premierships]: [Carlton and Essendon tied on 16](https://en.wikipedia.org/wiki/List_of_VFL/AFL_records#Premierships){target=\"_blank\"}\n\nHowever, if you are a [gadget-type operator](https://www.youtube.com/clip/UgkxtEJxG9BEEvFMmGfpjSBWc4MtWPKwNTLW){target=\"_blank\"}[^tipping-name] like myself, you will expand the number of footy stats questions you can answer immensely by accessing and manipulating the raw data yourself. There are, of course, a multitude of tools and approaches to this, but in this post, I will be using R (my preferred programming language)\n\n[^tipping-name]: My footy tipping username is *Gadget-type Operator* and I often use [other BT quotes](https://www.youtube.com/watch?v=E_JCdK4ah78){target=\"_blank\"} for my username on other (even non-footy-related) accounts\n\n\n## Target Audience Unclear\n\nIf you are a footy fan this article will likely present as some mildy interesting footy facts, combined with incomprehensible techno-babble. For R users this will likely appear to be a fairly elementary data wrangling exercise, combined with a bunch of references you don't understand. But if you are both a footy fan and an R user, it will hopefully prove to be a quite interesting read. \n\n# A concrete example\n\nI recently had the misfortune ([as a West Coast fan](https://www.instagram.com/p/CiRdBD9A4lU/){target=\"_blank\"}) of attending a game live between the Sydney Swans and the (not so) mighty West Coast Eagles with the following score line (the equal fourth highest margin of all time):\n\n![](swans-vs-eagles-score.png \"How embarrassing!\")\n\n::: {.column-margin}\n![Our respective reactions really tell it all](photo-at-game.jpg)\n:::\n\nIn one-sided games like this, it seems to me that footy stats questions become more common than usual for two key reasons:\n\n1. It adds something interesting to a game that otherwise lacks excitement\n\n2. These games are often filled with large statistical anomalies that might set new records\n\n\nAs my usual footy-going companion (Saroop) and I are both actuaries by trade, footy stats questions were flying left, right, and centre on that gloomy (though not due to the weather) Saturday night at the SCG. The questions we posed did not just vanish into the aether either, I ,with the idea of writing this article in mind, decided to record the more interesting (and doable) questions.\n\n## Question List\n\nThis article aims to tackle the questions listed below:\n\n> \n- [What is the record for the highest scoring quarter?](#highest-scoring-quarter)\n- [What is the record for the most individual goal kickers for a team in a single game?](#individual)\n- [What is the record for the most multiple goal kickers for a team in a single game?](#multiple)\n- [What is the record for the most players kicking five or more goals for a team in a single game (i.e. the most \"bags\")?](#five-or-more-bags)\n- [What is the record for the most clangers in a game?](#most-clangers)\n- [What is the record for the worst disposal efficiency in a game?](#worst-disposal-efficiency)\n- [Who was the youngest player to win a Norm Smith Medal?](#youngest-norm-smith-medalist)\n- What is the record for the most unanswered goals in a game?*\n- What is the school with the most AFL players on their list?*\n- What city/town has the most AFL players relative to population?*\n- Which player has the best goals to behinds ratio?*\n- Have there ever been any undefeated seasons?*\n- What is the worst win-loss record to make finals?*\n\n\n::: {.callout-note}\n\n## Disclaimer on the asterisk\n\nI have put an asterisk next to questions that either: \n\n* cannot easily be answered by the methods I discuss below; or\n* are too lengthy for this blog post (these may get their own dedicated blog post in the future). \n\nThe remaining questions (which have hyperlinks to later sections of this article) will be tackled using R below.\n\n:::\n\n\nNote that things are about to get very technical so if you are only really interested in the answers (and not the R coding), you can jump ahead by [clicking here](#figuring-out-the-answers).\n\n\n\n\n# Technical Background\n\n## The `fitzRoy` Package\n\nThe first step in analysing AFL data is obtaining the data (the so-called \"collection\" phase). Our first thought might be to search the web for publicly available AFL datasets and APIs or even scrape the data from websites such as the [official AFL website](https://www.afl.com.au/){target=\"_blank\"}, [Footywire](https://www.footywire.com/){target=\"_blank\"} or [AFL Tables](https://afltables.com/afl/afl_index.html){target=\"_blank\"}. But there is a more straight-forward way.\n\n\nWhile most people now know Fitzroy as a trendy inner city suburb of Melbourne, filled with terraces and [over-priced croissants](https://www.lunecroissanterie.com/){target=\"_blank\"}, it was once home[^bith-place] to the mighty ([and now merged out of existence](https://en.wikipedia.org/wiki/Brisbane_Lions#Brisbane_Bears_absorb_Fitzroy_Football_Club's_AFL_operations,_become_Brisbane_Lions){target=\"_blank\"}) Fitzroy Lions Football Club.\n \n[![](fitzroy-lions-logo.png \"La Marseillaise!\")](https://www.youtube.com/watch?v=olKa-0H26k4){target=\"_blank\"}\n\n[^bith-place]: It is also (regrettably) the place of my birth but as a WA boy I don't like to talk about the fact that my parents happened to be in Melbourne when I was born.\n\n\nWe can obtain the data we need very simply using its name-sake, the [`fitzRoy` R package](https://jimmyday12.github.io/fitzRoy/){target=\"_blank\"}. This package abstracts away all the web scraping and API calls for us into a very helpful family of `fetch_*` functions.\n\nSo let's begin by loading the `fitzRoy` package and while we're at it, we will also load all the other packages we will be using later on.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(fitzRoy)\n\n# Note that I generally avoid mixing dplyr and data.table at the same time\n# but the reason I have done with will become apparent later\nlibrary(dplyr)\nlibrary(data.table)\n\nlibrary(rvest)\nlibrary(stringr)\nlibrary(tidyr)\nlibrary(purrr)\nlibrary(lubridate)\n```\n:::\n\n\n\n\n### `fitzRoy` Data Sources\n\n`fitzRoy` provides access to a number of footy data sources[^FitzRoy] including [AFL Tables](https://afltables.com/){target=\"_blank\"} and the [official AFL website](https://www.afl.com.au/fixture){target=\"_blank\"}. Each data source has its own advantages and disadvantages, for example:\n\n* [AFL Tables](https://afltables.com/){target=\"_blank\"} has the entirety of AFL/VFL history (1897 to present) but lacks some of the more advanced stats.\n\n* The [official AFL website](https://www.afl.com.au/fixture){target=\"_blank\"} only has data from 2014 onwards but it also probably the most complete in terms of the advanced statistics it contains (e.g. centre bounce attendances[^CBAs]).\n\n[^FitzRoy]: Up-to-date information on data sources can be found on `fitzRoy`'s [documentation site](https://jimmyday12.github.io/fitzRoy/articles/fitzRoy.html#data-sources){target=\"_blank\"}\n\n[^CBAs]: Centre bounce attendances (CBAs) are a commonly-used metric in AFL Fantasy, fantasy \"coaches\" often look at tools such as [this one](https://dfsaustralia.com/afl-cbas/){target=\"_blank\"} to help with researching their trades.\n\nAll of the different data sources are compared in the table below:\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\n\nNote that each row of the table can be expanded to reveal what data is available from each source, as well at its use-case. In addition to the sources listed in this table, the following functions only come from one source:\n\n* [`fetch_betting_odds_footywire()`](https://jimmyday12.github.io/fitzRoy/reference/fetch_betting_odds_footywire.html){target=\"_blank\"}\n\n* [`fetch_squiggle_data()`](https://jimmyday12.github.io/fitzRoy/reference/fetch_squiggle_data.html){target=\"_blank\"}\n\n* [`fetch_coaches_votes()`](https://jimmyday12.github.io/fitzRoy/reference/fetch_coaches_votes.html){target=\"_blank\"}\n\n## Importing the Data\n\nFor the purposes of answering the questions [above](#question-list), I am most interested in the full history of the AFL and have decided to use AFL Tables as my primary data source[^data-regret]. I will also use Fryzigg for one small use-case where AFL Tables is missing key data (disposal efficiency) and some bespoke web scraping for Norm Smith Medallists.\n\n[^data-regret]: In hindsight I somewhat regret this decision and would have probably preferred to use Fryzigg for everything (with the exception of quarter scores which it doesn't have and AFL Tables does) but I only realised it had the full AFL/VFL history when I constructed the table comparing data sources above\n\n\nThe `fetch_*` family of functions from the `fitzRoy` package allow us to read data from the various sources. Consult the [documentation site](https://jimmyday12.github.io/fitzRoy/reference/index.html){target=\"_blank\"} for a complete list of all the available functions.\n\n\nWe can *fetch* this data via `fitzRoy` with the following code:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_stats <- fetch_player_stats_afltables(season = 1897:2023)\nresults <- fetch_results_afltables(season = 1897:2023)\nplayer_details <- fetch_player_details_afltables()\nplayer_stats_fryzigg <- fetch_player_stats_fryzigg(season = 1897:2023)\n```\n:::\n\n\n\n\n::: {.callout-warning}\n\n#### Being a good citizen\n\nWhen sourcing data from `fitzRoy`, it is important to follow good data collection[^fitzRoy-good-practice] etiquette by only downloading the data you need and avoiding repeatedly downloading the same data over and over again. This prevents servers being overloaded and will mean everyone will get their data faster.\n\nIn keeping with this, for the purposes of this blog post, I have saved the data in a local RDS file. That way, I can simply use `readRDS()` instead of of repeatedly calling the `fetch_*` functions. The code for this is below (and the code above is not actually run but is cleaner for demonstration purposes):\n\n[^fitzRoy-good-practice]: this topic is discussed on the `fitzRoy` documentation site [here](https://jimmyday12.github.io/fitzRoy/articles/fitzRoy.html#good-practices){target=\"_blank\"}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nif(file.exists(\"data/player_stats.RDS\")) {\n player_stats <- readRDS(\"data/player_stats.RDS\")\n \n} else {\n player_stats <- fitzRoy::fetch_player_stats_afltables(season = 1897:2023)\n saveRDS(player_stats, \"data/player_stats.RDS\")\n}\n\nif(file.exists(\"data/results.RDS\")) {\n results <- readRDS(\"data/results.RDS\")\n \n} else {\n results <- fitzRoy::fetch_results_afltables(season = 1897:2023)\n saveRDS(results, \"data/results.RDS\")\n}\n\nif(file.exists(\"data/player_details.RDS\")) {\n player_details <- readRDS(\"data/player_details.RDS\")\n \n} else {\n player_details <- fetch_player_details_afltables()\n saveRDS(player_details, \"data/player_details.RDS\")\n}\n\nif(file.exists(\"data/player_stats_fryzigg.RDS\")) {\n player_stats_fryzigg <- readRDS(\"data/player_stats_fryzigg.RDS\")\n \n} else {\n player_stats_fryzigg <- fetch_player_stats_fryzigg(season = 1897:2023)\n saveRDS(player_stats_fryzigg, \"data/player_stats_fryzigg.RDS\")\n}\n```\n:::\n\n\n:::\n\nThe data we have read in is as at round 19 of the 2023 AFL season.\n\n## Finicky Details About Other R Packages\n\n### Tidyverse Versus `data.table`\n\nIn the R community, there is an [ongoing power struggle](https://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly){target=\"_blank\"} between using the Posit[^previously-RStudio]-backed [tidyverse](https://www.tidyverse.org/){target=\"_blank\"} and the heavily-optimised [`data.table`](https://rdatatable.gitlab.io/data.table/){target=\"_blank\"}.\n\n[^previously-RStudio]: [formerly known as RStudio](https://posit.co/blog/rstudio-is-becoming-posit/){target=\"_blank\"} ([RIP](https://www.youtube.com/watch?v=TtMzTGfs-fc){target=\"_blank\"})\n\nAs to not unsettle people who prefer either `dplyr` (and the tidyverse) or `data.table`, I have written code in both packages[^base-r-dig]. Where relevant, I have used a tabbed layout for the convenience of the reader. As my personal preference for readability purposes is the tidyverse[^tidyverse-rationale], I will place this code in the first tab.\n\n\n::: {.callout-important}\n\n#### A cautionary tale\n\nWhile doing things in this way did scratch something of a perfectionist's itch in me and was a fun learning experience, I will probably refrain from doing something like this again in future posts. I don't think the additional time it took me to essentially write the same code twice is worth the effort.\n\n:::\n\n[^base-r-dig]: Note that I have not written a `base` R dataframes version because I can see arguments for using both tidyverse and `data.table` but `base` R `data.frame`s will probably cause more pain than they are worth (there is a reason that tidyverse and `data.table` exist)\n\n\n[^tidyverse-rationale]: I will typically will only use `data.table` if the size of data necessitates it. In this case, the data is less than a million rows so there are no problems.\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\nNote that the code below is somewhat redundant as the `fitzRoy` package follows the [tidyverse philosophy](https://jimmyday12.github.io/fitzRoy/CONTRIBUTING.html){target=\"_blank\"} and returns [tibbles](https://tibble.tidyverse.org/){target=\"_blank\"}. I have used the `_tb` suffix[^tb-abbreviation] to distinguish `tibble`/`dplyr`/tidyverse from the `data.table` code.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_stats_tb <- as_tibble(player_stats)\nresults_tb <- as_tibble(results)\nplayer_details_tb <- as_tibble(player_details)\nplayer_stats_fryzigg_tb <- as_tibble(player_stats_fryzigg)\n```\n:::\n\n\n[^tb-abbreviation]: an abbreviation of \"tibble\"\n\n\n#### `data.table`\n\nHenceforth, all `data.table` code will use the `_dt` suffix[^dt-abbreviation] as to distinguish it from the tidyverse code.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_stats_dt <- as.data.table(player_stats)\nresults_dt <- as.data.table(results)\nplayer_details_dt <- as.data.table(player_details)\nplayer_stats_fryzigg_dt <- as.data.table(player_stats_fryzigg)\n```\n:::\n\n\n[^dt-abbreviation]: an acronym of \"data.table\"\n\n:::\n\n\n#### Adoption of the Native Pipe Operator (`|>`)\n\nThe so-called *pipe operator* (`%>%`) of the [`magrittr`](https://magrittr.tidyverse.org/){target=\"_blank\"} package has been a core staple of tidyverse since its inception, but since the R core team introduced the so-called *native pipe* (`|>`) to `base` R (in version [4.1](https://www.jumpingrivers.com/blog/new-features-r410-pipe-anonymous-functions/){target=\"_blank\"}[^function-shorthand]), this has led to a split in adoption. There are some nuances in its usage[^future-pipe-post] but it overall behaves in a similar way to the `magrittr` pipe and has [less overhead (and is therefore faster)](https://stackoverflow.com/questions/67633022/what-are-the-differences-between-rs-new-native-pipe-and-the-magrittr-pipe){target=\"_blank\"}. While the native pipe was initially missing some of the key features of the `magrittr` pipe, new features[^pipe-features] have been added to it that (in my mind) mean that it might have even surpassed the `magrittr` pipe.\n\n\nWhile I have tried to appease people in both the tidyverse and `data.table` camps, I will not be re-writing my code more than once with such as minor syntactic difference as the pipe I use. I will therefore be dragging all my tidyverse-using readers kicking and screaming into the R 4.1 world by adopting the native pipe (`|>`) in my tidyverse code.\n\nNote that the common RStudio shortcut, `Ctrl+Shift+M` can be changed from the `magrittr` pipe (`%>%`), which is still the default, to the native pipe (`|>`).\n\n[^function-shorthand]: another cool thing introduced in this version of R was so-called function shorthand (`\\()`), see `help(\"function\")` for more details\n\n[^future-pipe-post]: I may even cover these in a future blog post\n\n[^pipe-features]: In R version 4.2, the `_` symbol was added as a placeholder character and in R version 4.3, extractions using the `$` symbol are now allowed\n\n\n\n### Webscraping package\n\nWhile the majority of our data will be sourced using the `fitzRoy` package, a small amount of data (namely Norm Smith medalists, which are outside of the scope of `fitzRoy`) will require us to perform some bespoke web scraping. This will be performed using the `rvest` package (loaded [above](#cb1)).\n\n\n\n# Preliminary Data Wrangling\n\n\n## Flattening the Data\n\n\nTo begin with, let's scrutinise the results data in order to figure out what we have to work with.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr(results)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ntibble [16,352 × 16] (S3: tbl_df/tbl/data.frame)\n $ Game : num [1:16352] 1 2 3 4 5 6 7 8 9 10 ...\n $ Date : Date[1:16352], format: \"1897-05-08\" \"1897-05-08\" ...\n $ Round : chr [1:16352] \"R1\" \"R1\" \"R1\" \"R1\" ...\n $ Home.Team : chr [1:16352] \"Fitzroy\" \"Collingwood\" \"Geelong\" \"Sydney\" ...\n $ Home.Goals : int [1:16352] 6 5 3 3 6 4 3 9 6 5 ...\n $ Home.Behinds: int [1:16352] 13 11 6 9 4 6 8 10 5 9 ...\n $ Home.Points : int [1:16352] 49 41 24 27 40 30 26 64 41 39 ...\n $ Away.Team : chr [1:16352] \"Carlton\" \"St Kilda\" \"Essendon\" \"Melbourne\" ...\n $ Away.Goals : int [1:16352] 2 2 7 6 5 8 10 3 5 7 ...\n $ Away.Behinds: int [1:16352] 4 4 5 8 6 2 6 1 7 8 ...\n $ Away.Points : int [1:16352] 16 16 47 44 36 50 66 19 37 50 ...\n $ Venue : chr [1:16352] \"Brunswick St\" \"Victoria Park\" \"Corio Oval\" \"Lake Oval\" ...\n $ Margin : int [1:16352] 33 25 -23 -17 4 -20 -40 45 4 -11 ...\n $ Season : num [1:16352] 1897 1897 1897 1897 1897 ...\n $ Round.Type : chr [1:16352] \"Regular\" \"Regular\" \"Regular\" \"Regular\" ...\n $ Round.Number: int [1:16352] 1 1 1 1 2 2 2 2 3 3 ...\n```\n:::\n:::\n\n\n\nWhile inspecting the `results` we may note that certain key match-level information (e.g. quarter-by-quarter scores) for answering some of our [questions](#question-list) is missing from it. As it turns out, this data is actually available on the `player_stats_afl_tables` data (one row per player per match) instead. Thus, we will opt to create a 'flattened' version of `player_stats_afl_tables` with all the match-level fields available to use on both datasets and discard the `results` dataset (save for some quick checks to make sure the player data 'flattening' worked as expected).\n\n\nNow, let's take a look at the `player_stats_afl_tables` dataset to determine which fields are player-level and which are match-level.\n\n::: {.panel-tabset}\n\n### Code\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr(player_stats)\n```\n:::\n\n\nNote that the output has been placed into another tab as it is rather long.\n\n### Output\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n```\ntibble [663,115 × 59] (S3: tbl_df/tbl/data.frame)\n $ Season : num [1:663115] 1897 1897 1897 1897 1897 ...\n $ Round : chr [1:663115] \"1\" \"1\" \"1\" \"1\" ...\n $ Date : Date[1:663115], format: \"1897-05-08\" \"1897-05-08\" ...\n $ Local.start.time : int [1:663115] 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 ...\n $ Venue : chr [1:663115] \"Brunswick St\" \"Brunswick St\" \"Brunswick St\" \"Brunswick St\" ...\n $ Attendance : num [1:663115] 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 ...\n $ Home.team : chr [1:663115] \"Fitzroy\" \"Fitzroy\" \"Fitzroy\" \"Fitzroy\" ...\n $ HQ1G : int [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ HQ1B : int [1:663115] 5 5 5 5 5 5 5 5 5 5 ...\n $ HQ2G : int [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ HQ2B : int [1:663115] 11 11 11 11 11 11 11 11 11 11 ...\n $ HQ3G : int [1:663115] 5 5 5 5 5 5 5 5 5 5 ...\n $ HQ3B : int [1:663115] 13 13 13 13 13 13 13 13 13 13 ...\n $ HQ4G : int [1:663115] 6 6 6 6 6 6 6 6 6 6 ...\n $ HQ4B : int [1:663115] 13 13 13 13 13 13 13 13 13 13 ...\n $ Home.score : int [1:663115] 49 49 49 49 49 49 49 49 49 49 ...\n $ Away.team : chr [1:663115] \"Carlton\" \"Carlton\" \"Carlton\" \"Carlton\" ...\n $ AQ1G : int [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ AQ1B : int [1:663115] 3 3 3 3 3 3 3 3 3 3 ...\n $ AQ2G : int [1:663115] 1 1 1 1 1 1 1 1 1 1 ...\n $ AQ2B : int [1:663115] 3 3 3 3 3 3 3 3 3 3 ...\n $ AQ3G : int [1:663115] 2 2 2 2 2 2 2 2 2 2 ...\n $ AQ3B : int [1:663115] 3 3 3 3 3 3 3 3 3 3 ...\n $ AQ4G : int [1:663115] 2 2 2 2 2 2 2 2 2 2 ...\n $ AQ4B : int [1:663115] 4 4 4 4 4 4 4 4 4 4 ...\n $ Away.score : int [1:663115] 16 16 16 16 16 16 16 16 16 16 ...\n $ First.name : chr [1:663115] \"Bill\" \"Jimmy\" \"Bob\" \"Tom\" ...\n $ Surname : chr [1:663115] \"Ahern\" \"Aitken\" \"Armstrong\" \"Blake\" ...\n $ ID : num [1:663115] 4415 4416 4417 4419 4421 ...\n $ Jumper.No. : chr [1:663115] \"0\" \"0\" \"0\" \"0\" ...\n $ Playing.for : chr [1:663115] \"Carlton\" \"Carlton\" \"Carlton\" \"Carlton\" ...\n $ Kicks : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Marks : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Handballs : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Goals : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Behinds : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Hit.Outs : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Tackles : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Rebounds : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Inside.50s : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Clearances : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Clangers : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Frees.For : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Frees.Against : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Brownlow.Votes : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Contested.Possessions : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Uncontested.Possessions: num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Contested.Marks : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Marks.Inside.50 : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ One.Percenters : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Bounces : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Goal.Assists : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Time.on.Ground.. : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Substitute : int [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Umpire.1 : chr [1:663115] \"Samuel Hood\" \"Samuel Hood\" \"Samuel Hood\" \"Samuel Hood\" ...\n $ Umpire.2 : chr [1:663115] \"\" \"\" \"\" \"\" ...\n $ Umpire.3 : chr [1:663115] \"\" \"\" \"\" \"\" ...\n $ Umpire.4 : chr [1:663115] \"\" \"\" \"\" \"\" ...\n $ group_id : int [1:663115] 2 2 2 2 2 2 2 2 2 2 ...\n```\n:::\n:::\n\n\n:::\n\n\nInspecting the fields and using some knowledge of the game, we can determine that the following fields are player-level:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_level_fields <- c(\n \"First.name\", \"Surname\", \"ID\", \"Jumper.No.\", \"Playing.for\", \"Kicks\", \"Marks\", \n \"Handballs\", \"Goals\", \"Behinds\", \"Hit.Outs\", \"Tackles\", \"Rebounds\", \"Inside.50s\", \n \"Clearances\", \"Clangers\", \"Frees.For\", \"Frees.Against\", \"Brownlow.Votes\", \n \"Contested.Possessions\", \"Uncontested.Possessions\", \"Contested.Marks\", \n \"Marks.Inside.50\", \"One.Percenters\", \"Bounces\", \"Goal.Assists\", \"Time.on.Ground..\",\n \"Substitute\"\n )\n\nmatch_level_fields <- setdiff(colnames(player_stats), player_level_fields)\n```\n:::\n\n\nWe can now safely group and aggregate by the `match_level_fields` below:\n\n\n::: {.panel-tabset}\n\n### Tidyverse\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_stats_tb |> \n mutate(\n home_player = Playing.for == Home.team,\n away_player = Playing.for == Away.team\n ) |> \n group_by(pick(all_of(match_level_fields))) |> \n summarise(\n player_count = n(),\n home_kicks = sum(Kicks * home_player),\n away_kicks = sum(Kicks * away_player),\n home_marks = sum(Marks * home_player),\n away_marks = sum(Marks * away_player),\n home_handballs = sum(Handballs * home_player),\n away_handballs = sum(Handballs * away_player),\n home_hit_outs = sum(Hit.Outs * home_player),\n away_hit_outs = sum(Hit.Outs * away_player),\n home_tackles = sum(Tackles * home_player),\n away_tackles = sum(Tackles * away_player),\n home_rebounds = sum(Rebounds * home_player),\n away_rebounds = sum(Rebounds * away_player),\n home_inside_50s = sum(Inside.50s * home_player),\n away_inside_50s = sum(Inside.50s * away_player),\n home_clearances = sum(Clearances * home_player),\n away_clearances = sum(Clearances * away_player),\n home_clangers = sum(Clangers * home_player),\n away_clangers = sum(Clangers * away_player),\n home_frees_for = sum(Frees.For * home_player),\n away_frees_for = sum(Frees.For * away_player),\n home_frees_against = sum(Frees.Against * home_player),\n away_frees_against = sum(Frees.Against * away_player),\n home_contested_possessions = sum(Contested.Possessions * home_player),\n away_contested_possessions = sum(Contested.Possessions * away_player),\n home_uncontested_possessions = sum(Uncontested.Possessions * home_player),\n away_uncontested_possessions = sum(Uncontested.Possessions * away_player),\n home_contested_marks = sum(Contested.Marks * home_player),\n away_contested_marks = sum(Contested.Marks * away_player),\n home_marks_inside_50 = sum(Marks.Inside.50 * home_player),\n away_marks_inside_50 = sum(Marks.Inside.50 * away_player),\n home_one_percenters = sum(One.Percenters * home_player),\n away_one_percenters = sum(One.Percenters * away_player),\n home_bounces = sum(Bounces * home_player),\n away_bounces = sum(Bounces * away_player),\n home_goal_assists = sum(Goal.Assists * home_player),\n away_goal_assists = sum(Goal.Assists * away_player),\n .groups = \"drop\"\n ) |>\n arrange(Date, Local.start.time, Home.team) -> \n match_stats_flat_tb\n\n# verify correct number of games:\nnrow(match_stats_flat_tb) == nrow(results_tb)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n### `data.table`\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmatch_stats_flat_dt <- copy(player_stats_dt)\n\nmatch_stats_flat_dt[, home_player := Playing.for == Home.team]\nmatch_stats_flat_dt[, away_player := Playing.for == Away.team]\n\nmatch_stats_flat_dt <- match_stats_flat_dt[, .(\n player_count = .N,\n home_kicks = sum(Kicks * home_player),\n away_kicks = sum(Kicks * away_player),\n home_marks = sum(Marks * home_player),\n away_marks = sum(Marks * away_player),\n home_handballs = sum(Handballs * home_player),\n away_handballs = sum(Handballs * away_player),\n home_hit_outs = sum(Hit.Outs * home_player),\n away_hit_outs = sum(Hit.Outs * away_player),\n home_tackles = sum(Tackles * home_player),\n away_tackles = sum(Tackles * away_player),\n home_rebounds = sum(Rebounds * home_player),\n away_rebounds = sum(Rebounds * away_player),\n home_inside_50s = sum(Inside.50s * home_player),\n away_inside_50s = sum(Inside.50s * away_player),\n home_clearances = sum(Clearances * home_player),\n away_clearances = sum(Clearances * away_player),\n home_clangers = sum(Clangers * home_player),\n away_clangers = sum(Clangers * away_player),\n home_frees_for = sum(Frees.For * home_player),\n away_frees_for = sum(Frees.For * away_player),\n home_frees_against = sum(Frees.Against * home_player),\n away_frees_against = sum(Frees.Against * away_player),\n home_contested_possessions = sum(Contested.Possessions * home_player),\n away_contested_possessions = sum(Contested.Possessions * away_player),\n home_uncontested_possessions = sum(Uncontested.Possessions * home_player),\n away_uncontested_possessions = sum(Uncontested.Possessions * away_player),\n home_contested_marks = sum(Contested.Marks * home_player),\n away_contested_marks = sum(Contested.Marks * away_player),\n home_marks_inside_50 = sum(Marks.Inside.50 * home_player),\n away_marks_inside_50 = sum(Marks.Inside.50 * away_player),\n home_one_percenters = sum(One.Percenters * home_player),\n away_one_percenters = sum(One.Percenters * away_player),\n home_bounces = sum(Bounces * home_player),\n away_bounces = sum(Bounces * away_player),\n home_goal_assists = sum(Goal.Assists * home_player),\n away_goal_assists = sum(Goal.Assists * away_player)\n), by = match_level_fields]\n\nsetorder(match_stats_flat_dt, Date, Local.start.time, Home.team)\n\n# verify outputs match:\nidentical(as.data.frame(match_stats_flat_tb), as.data.frame(match_stats_flat_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n:::\n\nHenceforth, `player_stats_*` and `match_stats_flat_*` will be the two datasets we will use predominantly.\n\n## IDs and URLs\n\nOne thing that our `match_stats_flat_*` dataset is currently lacking is a game ID for use as a primary key. In addition, being able to link directly to AFL tables when talking about a particular game or player would be handy.\n\n### Game ID and URL\n\nLet's tackle the game ID by writing some functions to an ID which also conveniently lines up with the way AFL Tables game URLs work (two birds with one stone). \n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nteam_code_map <- c(\n \"Adelaide\" = \"01\",\n \"Adelaide Crows\" = \"01\",\n \"Brisbane Bears\" = \"02\",\n \"Carlton\" = \"03\",\n \"Collingwood\" = \"04\",\n \"Essendon\" = \"05\",\n \"Fitzroy\" = \"06\",\n \"Western Bulldogs\" = \"07\",\n \"Fremantle\" = \"08\",\n \"Geelong\" = \"09\",\n \"Geelong Cats\" = \"09\",\n \"Hawthorn\" = \"10\",\n \"Melbourne\" = \"11\",\n \"North Melbourne\" = \"12\",\n \"Port Adelaide\" = \"13\",\n \"Richmond\" = \"14\",\n \"St Kilda\" = \"15\",\n \"Sydney\" = \"16\",\n \"Sydney Swans\" = \"16\",\n \"University\" = \"17\",\n \"West Coast\" = \"18\",\n \"West Coast Eagles\" = \"18\",\n \"Brisbane Lions\" = \"19\",\n \"Gold Coast\" = \"20\",\n \"Gold Coast Suns\" = \"20\",\n \"Greater Western Sydney\" = \"21\",\n \"GWS Giants\" = \"21\"\n)\n\n# The three functions below are all vectorised for efficiency purposes\nget_team_code <- function(team_name) {\n unname(team_code_map[team_name])\n}\n\nget_game_id <- function(home_team_code, away_team_code, game_date) {\n # example ID: 161820230624\n game_date_string <- format(game_date, \"%Y%m%d\")\n \n ifelse(\n home_team_code > away_team_code, \n # the smaller code is always first\n paste0(away_team_code, home_team_code, game_date_string),\n paste0(home_team_code, away_team_code, game_date_string)\n )\n}\n\nget_game_afltables_url <- function(game_id, season) {\n # example url: https://afltables.com/afl/stats/games/2023/161820230624.html\n paste0(\"https://afltables.com/afl/stats/games/\", season,\"/\", game_id, \".html\")\n}\n```\n:::\n\n\nNow let's use these functions[^vectorisation-benefits] to add a primary key to our `match_stats_flat_*` datasets.\n\n[^vectorisation-benefits]: Note that as the functions are vectorised, we need not use the slow `purrr::map*()` or `*apply()` family of functions to apply them to a column of our `tibble` and `data.table` respectively.\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmatch_stats_flat_tb |> \n mutate(\n home_team_code = get_team_code(Home.team),\n away_team_code = get_team_code(Away.team),\n game_id = get_game_id(home_team_code, away_team_code, Date),\n game_afltables_url = get_game_afltables_url(game_id, Season)\n ) |> \n relocate(game_id, .before = Season) |> \n arrange(game_id) ->\n match_stats_flat_tb\n```\n:::\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmatch_stats_flat_dt[, home_team_code := get_team_code(Home.team)]\nmatch_stats_flat_dt[, away_team_code := get_team_code(Away.team)]\nmatch_stats_flat_dt[, game_id := get_game_id(home_team_code, away_team_code, Date)]\nmatch_stats_flat_dt[, game_afltables_url := get_game_afltables_url(game_id, Season)]\n\nsetcolorder(match_stats_flat_dt, c(\"game_id\", setdiff(names(match_stats_flat_dt), \"game_id\")))\nsetkey(match_stats_flat_dt, game_id)\n\n# verify outputs match:\nidentical(as.data.frame(match_stats_flat_tb), as.data.frame(match_stats_flat_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n:::\n\n### Player URLs\n\nIn a similar way we can add a player URL to our `player_stats_*` datasets, we start by creating a mapping table.\n\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# non-duplicate URL: https://afltables.com/afl/stats/players/E/Errol_Gulden.html\n# duplicate URL: https://afltables.com/afl/stats/players/J/Josh_Kennedy0.html, https://afltables.com/afl/stats/players/J/Josh_Kennedy1.html\n# for dealing with duplicates, for example Peter Brown (6 of the same name!) seems to have a nonsensical order\nplayer_stats_tb |> \n mutate(full_name = paste(First.name, Surname, sep = \"_\")) |> \n distinct(ID, full_name) |> \n group_by(full_name) |> \n arrange(ID) |>\n mutate(\n instance_number = as.character(cumsum(rep(1L, n())) - 1L),\n dup_count = n()\n ) |> \n mutate(\n number_suffix = if_else(dup_count == 1L, \"\", instance_number),\n first_letter = str_sub(full_name, 1, 1),\n player_afltables_url = paste0(\"https://afltables.com/afl/stats/players/\", \n first_letter, \"/\", full_name, number_suffix, \".html\")\n ) |> \n ungroup() |> \n select(ID, player_afltables_url) ->\n player_url_tb\n```\n:::\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_url_dt <- copy(player_stats_dt)\n\nplayer_url_dt[, full_name := paste(First.name, Surname, sep = \"_\")]\nplayer_url_dt <- unique(player_url_dt, by = c(\"ID\", \"full_name\"))\nsetorder(player_url_dt, ID)\nplayer_url_dt <- player_url_dt[, `:=`(\n instance_number = as.character(cumsum(rep(1L, .N)) - 1L),\n dup_count = .N\n), \"full_name\"]\n\nplayer_url_dt[, number_suffix := fifelse((dup_count == 1L), \"\", instance_number)]\nplayer_url_dt[, first_letter := str_sub(full_name, 1, 1)]\nplayer_url_dt[, player_afltables_url := paste0(\"https://afltables.com/afl/stats/players/\", \n first_letter, \"/\", full_name, number_suffix, \".html\")]\nplayer_url_dt <- player_url_dt[, .(ID, player_afltables_url)]\n\n# verify outputs match:\nidentical(as.data.frame(player_url_tb), as.data.frame(player_url_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n:::\n\nNow we can add add the game ID, game URL and player URL to the `player_stats_*` dataset.\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_stats_tb <- as_tibble(player_stats) # copied from above\n\nplayer_stats_tb |> \n mutate(\n home_team_code = get_team_code(Home.team),\n away_team_code = get_team_code(Away.team),\n game_id = get_game_id(home_team_code, away_team_code, Date),\n player = paste0(First.name, \" \", Surname, \" (\", Playing.for,\")\")\n ) |> \n left_join(match_stats_flat_tb |> select(game_id, game_afltables_url), by = \"game_id\") |> \n left_join(player_url_tb, by = \"ID\") |> \n relocate(c(\"game_id\", \"player\", \"ID\"), .before = Season) |>\n arrange(game_id, Playing.for, ID) ->\n player_stats_tb\n```\n:::\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_stats_dt <- as.data.table(player_stats) # copied from above\n\nplayer_stats_dt[, home_team_code := get_team_code(Home.team)]\nplayer_stats_dt[, away_team_code := get_team_code(Away.team)]\nplayer_stats_dt[, game_id := get_game_id(home_team_code, away_team_code, Date)]\nplayer_stats_dt[, player := paste0(First.name, \" \", Surname, \" (\", Playing.for,\")\")]\n\nplayer_stats_dt <- merge(\n player_stats_dt, match_stats_flat_dt[, c(\"game_id\", \"game_afltables_url\")], \n by = \"game_id\")\n\nplayer_stats_dt <- merge(player_stats_dt, player_url_dt, by = \"ID\")\n\nsetcolorder(player_stats_dt, c(c(\"game_id\", \"player\"), setdiff(names(player_stats_dt), c(\"game_id\", \"player\"))))\nsetkey(player_stats_dt, game_id, Playing.for, ID)\n# verify outputs match:\nidentical(as.data.frame(player_stats_tb), as.data.frame(player_stats_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n:::\n\n\n\n\n## Finding the Infamous Game\n\nLet's use these new datasets to perform the simple exercise of obtaining the game ID for the [aforementioned](#a-concrete-example) Swans versus Eagles game. We can henceforth use this game ID whenever relevant to rank the Swans in the statistical category we investigate.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n(\n infamous_game_id <- get_game_id(\n home_team_code = get_team_code(\"Sydney\"),\n away_team_code = get_team_code(\"West Coast\"),\n game_date = as.Date(\"2023-06-24\")\n )\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"161820230624\"\n```\n:::\n:::\n\n\n\nWe can then filter the data and present it below in a table below[^table-code-omitted].\n\n[^table-code-omitted]: note that the code to format the table is omitted.\n\n### Match Stats\n\n\n::: {.panel-tabset}\n\n### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmatch_stats_flat_tb |> \n filter(game_id == infamous_game_id)\n```\n:::\n\n\n### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmatch_stats_flat_dt[game_id == infamous_game_id, ]\n```\n:::\n\n\n:::\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\n### Player Stats\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_stats_tb |> \n filter(game_id == infamous_game_id)\n```\n:::\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_stats_dt[game_id == infamous_game_id, ]\n```\n:::\n\n\n:::\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\n\n\n# Figuring Out the Answers\n\nIn this section I will provide my working using R for each of the [aforementioned questions](#question-list). Where relevant, I will figure out where the [aforementioned infamous game](#a-concrete-example) places in the history of the AFL for that particular category.\n\nThe pathways I go down are only one of many permutations of stats you can look at and angles you can approach things from. The code I have written is also probably more thorough and well-presented than how I would typically do it. When I do this type of thing with no intent on publishing it, my data manipulations will generally be far more ad-hoc and expedient (I pay far less attention to reproducibility and consistent naming conventions).\n\n\n## Highest Scoring Quarter\n\nAs listed [above](#question-list), our first question was:\n\n> What is the record for the highest scoring quarter?\n\n::: {.callout-info}\n\nNote that there is already a [page](https://afltables.com/afl/teams/allteams/qh.html){target=\"_blank\"} on this topic on AFL Tables, but it is a good one to start with regardless.\n\n:::\n\nTo answer this question, we will begin by creating a reshaped version of the `match_stats_flat_*` dataset that is structured around quarters.\n\n\n::: {.panel-tabset}\n\n### Tidyverse\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngame_level_columns <- c(\"game_id\", \"game_afltables_url\", \"Season\", \"Round\", \"Venue\", \"Home.team\", \"Away.team\", \"Home.score\", \"Away.score\")\n\nmatch_stats_flat_tb |> \n select(all_of(game_level_columns), starts_with(\"HQ\"), starts_with(\"AQ\")) |> \n pivot_longer(cols = c(starts_with(\"HQ\"), starts_with(\"AQ\")), names_to = \"quarter_gb\", values_to = \"gb_count\") |>\n mutate(\n quarter = str_extract(quarter_gb, \"\\\\d\"),\n gb_label = if_else(str_detect(quarter_gb, \"G$\"), \"goals\", \"behinds\"),\n is_home_score = str_detect(quarter_gb, \"^H\")\n ) |>\n pivot_wider(id_cols = all_of(c(game_level_columns, \"quarter\", \"is_home_score\")), names_from = gb_label, values_from = gb_count) |>\n arrange(game_id, is_home_score, quarter) |> \n group_by(game_id, is_home_score) |> \n mutate(# make quarters incremental\n goals = c(head(goals, 1), diff(goals)),\n behinds = c(head(behinds, 1), diff(behinds))\n ) |> \n ungroup() |> \n mutate(\n score = goals * 6 + behinds,\n team = if_else(is_home_score, Home.team, Away.team),\n opposition = if_else(!is_home_score, Home.team, Away.team)\n ) |> \n select(-is_home_score) -> quarter_stats_tb\n```\n:::\n\n\n### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngame_level_columns <- c(\"game_id\", \"game_afltables_url\", \"Season\", \"Round\", \"Venue\", \"Home.team\", \"Away.team\", \"Home.score\", \"Away.score\")\n\nquarter_stats_dt <- copy(match_stats_flat_dt)\nquarter_stats_dt <- quarter_stats_dt[, .SD, .SDcols = names(quarter_stats_dt) %like% paste(\n paste(game_level_columns, collapse = \"|\"), \"^HQ\", \"^AQ\", \n sep = \"|\")]\nquarter_stats_dt <- melt(quarter_stats_dt, id.vars = game_level_columns, variable.name = \"quarter_gb\", value.name = \"gb_count\")\n\nquarter_stats_dt[, quarter := str_extract(quarter_gb, \"\\\\d\")]\nquarter_stats_dt[, gb_label := fifelse(str_detect(quarter_gb, \"G$\"), \"goals\", \"behinds\")]\nquarter_stats_dt[, is_home_score := str_detect(quarter_gb, \"^H\")]\n\nquarter_stats_dt[, quarter_gb:=NULL]\nquarter_stats_dt <- dcast(quarter_stats_dt, ... ~ gb_label, value.var = \"gb_count\")\n\n# make quarters incremental\nsetorder(quarter_stats_dt, game_id, is_home_score, quarter)\nquarter_stats_dt[, goals := c(head(goals, 1), diff(goals)), c(\"game_id\", \"is_home_score\")]\nquarter_stats_dt[, behinds := c(head(behinds, 1), diff(behinds)), c(\"game_id\", \"is_home_score\")]\n\nquarter_stats_dt[, score := goals * 6 + behinds]\nquarter_stats_dt[, team := fifelse(is_home_score, Home.team, Away.team)]\nquarter_stats_dt[, opposition := fifelse(!is_home_score, Home.team, Away.team)]\n\nquarter_stats_dt <- quarter_stats_dt[, .SD, .SDcols = c(game_level_columns, c(\"quarter\", \"goals\", \"behinds\", \"score\", \"team\", \"opposition\"))]\n\n# verify outputs match:\nidentical(as.data.frame(quarter_stats_tb), as.data.frame(quarter_stats_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n\n:::\n\nWe will answer this question for each quarter (first, second, third and fourth), as well as overall. This means we we will be repeating the same process five times, so this calls for writing a function. The function will give us the top 5 scoring quarters, as well as ranking for the [aforementioned infamous game](#a-concrete-example) on the all time list of quarters.\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\nget_top_quarter_scores_tb <- function(data, quarter_selection) {\n data |> \n filter(quarter %in% quarter_selection) |>\n arrange(desc(score)) |> \n mutate(rank = seq_along(team)) |> \n filter(rank %in% 1:5 | (game_id == infamous_game_id & team == \"Sydney\")) |> \n select(rank, team, opposition, score, quarter, Season, Round, Venue, game_afltables_url, game_id)\n}\n```\n:::\n\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nget_top_quarter_scores_dt <- function(data, quarter_selection) {\n top_quarters_q1_dt <- copy(quarter_stats_dt)\n top_quarters_q1_dt <- top_quarters_q1_dt[\n quarter %in% quarter_selection, ]\n setorder(top_quarters_q1_dt, -score)\n top_quarters_q1_dt[, rank := seq_along(team)]\n top_quarters_q1_dt[rank %in% 1:5 | (game_id == infamous_game_id & team == \"Sydney\"), \n .(rank, team, opposition, score, quarter, Season, Round, Venue, game_afltables_url, game_id)]\n}\n```\n:::\n\n\n:::\n\n\n\n::: {.cell}\n\n:::\n\n\n\n\n### First Quarter\n\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_quarter_scores_q1_tb <- get_top_quarter_scores_tb(quarter_stats_tb, 1L)\n```\n:::\n\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_quarter_scores_q1_dt <- get_top_quarter_scores_dt(quarter_stats_tb, 1L)\n# verify outputs match:\nidentical(as.data.frame(top_quarter_scores_q1_tb), as.data.frame(top_quarter_scores_q1_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n:::\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\nThe record for the highest-scoring first quarter occurred during the bloodbath of an encounter that was the Bombers' first ever clash with the Gary Ablett Jr.-led Gold Coast Suns in their inaugural season in the AFL. The Bombers came out of the blocks in a flash and mercilessly obliterated the inexperienced Gold Coast side, notching up a blistering 93 point lead at quarter time. Interestingly, the Suns actually managed to win the second quarter as the Bombers appeared to take their foot off the accelerator a little to *only* win by 139 points when all was said and done. \n\n\n\n```{=html}\n\n```\n\n\n\nPerhaps the Suns' lethargy in the first quarter against the Dons can be explained as a hangover[^sun-festivities] following on from their [first ever win](https://afltables.com/afl/stats/games/2011/132020110423.html){target=\"_blank\"} the previous week[^first-win]. It is exciting to me that this is a game that I can remember watching on the television at the time, and it may have even been the first Gold Coast game I ever watched[^gold-coast]. Footy is full of narratives and it fun to spin one around this particular game (the context and stories make footy stats even more fun).\n\n[^sun-festivities]: As a club with an abundance of 18 or 19 year old blokes living out of home for the first time, the Suns were known to [over-indulge](https://youtu.be/Roehqg0Dd5k?t=61){target=\"_blank\"} in the Gold Coast party culture in those days.\n\n[^first-win]: Courtesy of a [(missed) shot at goal after the siren](https://youtu.be/CbJMAHRzHEo?t=318){target=\"_blank\"} from Justin Westhoff.\n\n[^gold-coast]: I am glad that I didn't give up on watching them after that (mainly due to Gary Ablett I will admit) because otherwise I would have missed *unbelievable goals* like [this](https://www.youtube.com/watch?v=2Ae5byjzUKg){target=\"_blank\"}.\n\n\n### Second Quarter\n\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_quarter_scores_q2_tb <- get_top_quarter_scores_tb(quarter_stats_tb, 2L)\n```\n:::\n\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_quarter_scores_q2_dt <- get_top_quarter_scores_dt(quarter_stats_tb, 2L)\n# verify outputs match:\nidentical(as.data.frame(top_quarter_scores_q2_tb), as.data.frame(top_quarter_scores_q2_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n:::\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\nFrom one teams' first season, to another's last. It is quite fitting (although sad) that the highest scoring second quarter was against a floundering ([aforementioned](#the-fitzroy-package)) `fitzRoy` Football Club (to which we owe the ease with which we obtained this data) en-route to a wooden spoon in their [final season](https://www.youtube.com/watch?v=Ykfsk0pXt9E){target=\"_blank\"} prior to merging with Brisbane Bears to form the Brisbane Lions.\n\nAs I was not yet born, I do not remember the game, but on the video below, the commentator shrewdly points to a strong wind prevailing towards the Crows' goal at the beginning of the second quarter which certainly didn't bode well for the Lions.\n\n\n```{=html}\n\n```\n\n\n\n### Third Quarter\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_quarter_scores_q3_tb <- get_top_quarter_scores_tb(quarter_stats_tb, 3L)\n```\n:::\n\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_quarter_scores_q3_dt <- get_top_quarter_scores_dt(quarter_stats_tb, 3L)\n# verify outputs match:\nidentical(as.data.frame(top_quarter_scores_q3_tb), as.data.frame(top_quarter_scores_q3_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n:::\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\nThe Swans' third quarter appears in 28^th^ position here, which is the best position it gets.\n\n\n### Fourth Quarter\n\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_quarter_scores_q4_tb <- get_top_quarter_scores_tb(quarter_stats_tb, 4L)\n```\n:::\n\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_quarter_scores_q4_dt <- get_top_quarter_scores_dt(quarter_stats_tb, 4L)\n# verify outputs match:\nidentical(as.data.frame(top_quarter_scores_q4_tb), as.data.frame(top_quarter_scores_q4_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n:::\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\n\nWell this was slightly unexpected, the Bloods[^incorrect-label] came home like a freight train against the woeful Saints in a game that took place over 100 years ago. It is also the only quarter in AFL history that has notched up a ton. Upon seeing this, given its vintage, I thought that perhaps the story of this game might have been lost to time but the Swans have a most [insightful article](https://www.sydneyswans.com.au/news/235004/slaughter-the-true-story-behind-a-record-thats-stood-for-a-century){target=\"_blank\"} up on their website about it. The explanation in the article claims that St Kilda were undermanned through a combination of injury and player protest on account of off-field disputes. It is safe to say that the \"Bloods\" showed them no mercy.\n\n[^incorrect-label]: Incorrectly labelled here as \"Sydney\" in the table above because at the time they resided in South Melbourne (they [relocated to Sydney in 1982](https://en.wikipedia.org/wiki/Sydney_Swans#Swans_move_to_Sydney:_1982.E2.80.931984){target=\"_blank\"}), they were also known as the \"Bloods\" prior to adopting their current [Swans mascot in 1933](https://en.wikipedia.org/wiki/Sydney_Swans#Club_identity){target=\"_blank\"} due to the number of Western Australians in the side (as a WA boy I couldn't help mentioning this)\n\n### All Quarters\n\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_quarter_scores_tb <- get_top_quarter_scores_tb(quarter_stats_tb, 1L:4L)\n```\n:::\n\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_quarter_scores_dt <- get_top_quarter_scores_dt(quarter_stats_tb, 1L:4L)\n# verify outputs match:\nidentical(as.data.frame(top_quarter_scores_tb), as.data.frame(top_quarter_scores_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n:::\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\n\nIn the infamous game, the Swans' third quarter [^premiership-quarter] was the only one that reached the top 100 quarters of all time. The fact that no quarter was even close to the the top indicates that the Swans were very consistent through-out the game. To use a cliche, they put in a consistent four-quarter effort and I suppose the Eagles were consistent too (consistently dismal).\n\n[^premiership-quarter]: i.e. the [premiership quarter](https://en.wiktionary.org/wiki/premiership_quarter){target=\"_blank\"}\n\n## Most Goal-kickers\n\nThree of the [aforementioned](#question-list) questions concern goal kickers. We can therefore write a function that can generalise our approach like we did for the previous question. \n\n\nThese questions were:\n\n> What is the record for the most individual goal kickers for a team in a single game?\n\n> What is the record for the most multiple goal kickers for a team in a single game?\n\n> What is the record for the most players kicking five or more goals for a team in a single game (i.e. the most \"bags\")?\n\nThe Swans game appeared to have a rather even distribution of goal kickers in the [infamous game](#a-concrete-example), so it will be interesting to see where it places on the all time list in this category.\n\n\n\n::: {.panel-tabset}\n\n### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\nget_top_goal_scorers_tb <- function(data, min_goals) {\ndata |> \n mutate(\n team = Playing.for,\n opposition = if_else(team == Home.team, Away.team, Home.team)\n ) |> \n group_by(team, opposition, Season, Round, Venue, game_afltables_url, game_id, Date) |> \n summarise(\n goal_kickers = sum(Goals > min_goals),\n .groups = \"drop\"\n ) |> \n arrange(desc(goal_kickers), desc(Date)) |> \n mutate(rank = seq_along(game_id)) |> \n relocate(rank, .before = \"team\") |> \n relocate(goal_kickers, .before = \"Season\") |> \n filter(rank %in% 1:5|(game_id == infamous_game_id & team == \"Sydney\")) |>\n select(-Date)\n}\n```\n:::\n\n\n\n\n\n### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nget_top_goal_scorers_dt <- function(data, min_goals) {\n top_goal_scorers_dt <- copy(data)\n top_goal_scorers_dt[, team := Playing.for]\n top_goal_scorers_dt[, opposition := fifelse(team == Home.team, Away.team, Home.team)]\n \n top_goal_scorers_dt <- top_goal_scorers_dt[,.(goal_kickers = sum(Goals > min_goals)),\n c(\"team\", \"opposition\", \"Season\", \"Round\", \"Venue\", \n \"game_afltables_url\", \"game_id\", \"Date\")]\n setorder(top_goal_scorers_dt, -goal_kickers, -Date)\n top_goal_scorers_dt[, rank := seq_along(game_id)]\n \n \n top_goal_scorers_dt[rank %in% 1:5|(game_id == infamous_game_id & team == \"Sydney\"),\n .(rank, team, opposition, goal_kickers, Season, Round, Venue, \n game_afltables_url, game_id)]\n}\n```\n:::\n\n\n:::\n\n\n\n::: {.cell}\n\n:::\n\n\n### Individual\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_goal_scorers_single_tb <- get_top_goal_scorers_tb(player_stats_tb, 0L)\n```\n:::\n\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_goal_scorers_single_dt <- get_top_goal_scorers_dt(player_stats_dt, 0L)\n# verify outputs match:\nidentical(as.data.frame(top_goal_scorers_single_tb), as.data.frame(top_goal_scorers_single_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n:::\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\n\nThe record for most goal kickers is actually tied by multiple teams. The most recent time this happened was in the Bulldogs' 101 point drubbing of the Eagles last year[^eagles-bad]. The [infamous game](#a-concrete-example) is a bit off the pace in 238^th^ but 12 goal-kickers is still double a starting forward line.\n\n[^eagles-bad]: Yet another example of how poorly the Eagles have been done in 2022 and 2023\n\n\n\n### Multiple\n\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_goal_scorers_multiple_tb <- get_top_goal_scorers_tb(player_stats_tb, 1L)\n```\n:::\n\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_goal_scorers_multiple_dt <- get_top_goal_scorers_dt(player_stats_dt, 1L)\n# verify outputs match:\nidentical(as.data.frame(top_goal_scorers_multiple_tb), as.data.frame(top_goal_scorers_multiple_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n:::\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\nThe Swans game actually places equal 7^th^ on the list of all time which is quite a notable result. It is also interesting that it was a similarly one-sided Swans game[^swans-bombers] at the SCG that takes outright top spot. In that game (circa 1987) the human highlight reel [Warwick Capper](https://www.youtube.com/watch?v=iiYJ6FZWwv0){target=\"_blank\"} led all comers for the Swans with a bag of 6 snags.\n\n[^swans-bombers]: [Full game](https://www.youtube.com/watch?v=IR2AjhhNDzE){target=\"_blank\"}, [article](https://www.sydneyswans.com.au/news/132940/footy-flashbacks-essendon){target=\"_blank\"}\n\n### Five or More (Bags)\n\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_goal_scorers_bags_tb <- get_top_goal_scorers_tb(player_stats_tb, 4L)\n```\n:::\n\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_goal_scorers_bags_dt <- get_top_goal_scorers_dt(player_stats_dt, 4L)\n# verify outputs match:\nidentical(as.data.frame(top_goal_scorers_bags_tb), as.data.frame(top_goal_scorers_bags_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n:::\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\nFour bags in one game has happened on two occasions, the most recent of which (in 1991) yet again featured the Fitzroy Lions, who were trounced by 157 points by the Hawks in North Hobart. \n\nThe list of bag-getters in this game makes for interesting reading, all were recognisable names (although one more for his family connections that his own merit). As one might expect, one of the bags was courtesy of Hawthorn spearhead Jason Duntall (6 snags), along with 7 apiece from WA boy Ben Allan[^ben-allan] and the three-time premiership player Darren Jarmon. Rounding out the four was a contribution of 5 snags from Paul Hudson, who is the son of Tasmanian footy legend Peter Hudson (how fitting that this game was played in Tassie) who averaged more than 5 goals a game himself (an incredible feat).\n\n[^ben-allan]: Sorry I couldn't help myself, he was also a [Claremont Tiger](https://www.claremontfc.com.au/){target=\"_blank\"} (up the mighty Tiges)\n\n\n## Questions About Questionable Disposal\n\nTwo of the [questions](#question-list) concerned clangers and disposal efficiency:\n\n> What is the record for the most clangers in a game?\n\n> What is the record for the worst disposal efficiency in a game?\n\nThese statistics (which we will define below) are more advanced and have only been recorded more recently, so we will therefore have to check which data sources to use and what years they are available for.\n\n### Most Clangers\n\n\nA clanger is defined as:\n\n> an absurd or embarrassing blunder.\n\nOr in more precise football statistics terms:\n\n> An error made by a player resulting in a negative result for his side. Disposal clangers are any kick or handball that directly turns the ball over to the opposition. Frees and 50-metre penalties against, No Pressure Errors, Dropped Marks and Debits are all included in clangers.\n>\n> Source: [Champion Data](https://www.championdata.com/glossary/afl){target=\"_blank\"}.\n\n[^source-champion-data]: \n\n\nClanger data is available on the AFL Tables data from 1998 onwards, so we will have to make do with only recent memory.\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_stats_tb |> \n filter(!is.na(Clangers)) |> \n group_by(Home.team, Away.team, Season, Round, Venue, game_afltables_url, game_id) |> \n summarise(\n clangers = sum(Clangers),\n .groups = \"drop\"\n ) |> \n arrange(desc(clangers), Season, Round) |> \n mutate(\n rank = seq_along(clangers)\n ) |> \n filter(rank %in% 1:5 | game_id == infamous_game_id) |> \n select(\n rank, game_id, Home.team, Away.team, clangers, Season, Round, Venue, game_afltables_url\n ) -> most_clangers_tb\n```\n:::\n\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmost_clangers_dt <- copy(player_stats_dt)\nmost_clangers_dt <- most_clangers_dt[!is.na(Clangers),]\nmost_clangers_dt <- most_clangers_dt[, .(clangers = sum(Clangers)), c(\"Home.team\", \"Away.team\", \"Season\", \"Round\", \"Venue\", \"game_afltables_url\", \"game_id\")]\nsetorder(most_clangers_dt, -clangers, Season, Round)\nmost_clangers_dt[,rank := seq_along(clangers)]\nmost_clangers_dt <- most_clangers_dt[rank %in% 1:5 | game_id == infamous_game_id,]\nmost_clangers_dt <- most_clangers_dt[, .(rank, game_id, Home.team, Away.team, clangers, Season, Round, Venue, game_afltables_url)]\n# verify outputs match:\nidentical(as.data.table(most_clangers_tb), as.data.table(most_clangers_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n:::\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\n\nA lot of the games in the top 5 are from very recent years, probably highlighting the more recent trend in teams rolling the dice more with possession (high risk, high reward), as popularised by Richmond and adopted to great success of late by the Magpies. Having spot-checked a few examples, there is also a bit of a pattern of wet weather impacting these games it would seem.\n\nAlso notable in the top 5 is one of three games played in Shanghai as part of the AFL's attempt at entering into the Chinese market between 2017 and 2019. While there was some rain about, one might presume from this that the Suns and Port didn't put on a particularly impressive display of our game on that occasion, either that or they were putting on an entertaining show with plenty of high-risk, high-reward plays. The [video highlights](https://www.youtube.com/watch?v=9b2gue45P4k){target=\"_blank\"} appear to be reasonably exciting, so I am going to assume the latter.\n\n\n### Worst Disposal Efficiency\n\nDisposal efficiency is\n\n> the percentage of disposals that are effective.[^source-champion-data]\n\nWhere effective disposal is any of:\n\n> * Effective handball: a handball to a teammate that hits the intended target.\n * Effective Short Kick: A kick of less than 40 metres that results in the intended target retaining possession. Does not include kicks that are spoiled by the opposition.\n * Effective Long Kick: A kick of more than 40 metres to a 50/50 contest or better for the team.[^source-champion-data]\n\nNote how the distance of the disposal is an element of how lenient the definition of \"effective\" is.\n\n\nThis statistic requires our first (and only) use of the [Fryzigg](https://twitter.com/fryzigg){target=\"_blank\"} data, as disposal efficiency is not present on the AFL Tables data. While the Fryzigg data has the full history of the AFL, disposal efficiency is missing for seasons prior to 2012 onwards. We will therefore have to make do with answering this question only for about the past decade.\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_stats_fryzigg_tb |> \n filter(!is.na(disposal_efficiency_percentage)) |> \n mutate(\n home_team_code = get_team_code(match_home_team),\n away_team_code = get_team_code(match_away_team),\n season = str_sub(match_date, 1, 4),\n afl_tables_game_id = get_game_id(home_team_code, away_team_code, as.Date(match_date)),\n afl_tables_url = get_game_afltables_url(afl_tables_game_id, season)\n ) |>\n group_by(afl_tables_game_id, match_home_team, match_away_team, venue_name, season, match_round, afl_tables_url) |> \n summarise(\n disposal_efficiency_game = sum(disposal_efficiency_percentage * disposals) / sum(disposals) / 100,\n .groups = \"drop\"\n ) |> \n arrange(disposal_efficiency_game) |> \n mutate(\n rank = seq_along(disposal_efficiency_game)\n ) |> \n filter(rank %in% 1:5 | afl_tables_game_id == infamous_game_id) |> \n select(\n rank, afl_tables_game_id, match_home_team, match_away_team, disposal_efficiency_game, season, match_round, venue_name, afl_tables_url\n ) -> worst_disposal_efficiency_games_tb\n```\n:::\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nworst_disposal_efficiency_games_dt <- copy(player_stats_fryzigg_dt)\nworst_disposal_efficiency_games_dt <- worst_disposal_efficiency_games_dt[!is.na(disposal_efficiency_percentage), ]\nworst_disposal_efficiency_games_dt[, home_team_code := get_team_code(match_home_team)]\nworst_disposal_efficiency_games_dt[, away_team_code := get_team_code(match_away_team)]\nworst_disposal_efficiency_games_dt[, season := str_sub(match_date, 1, 4)]\nworst_disposal_efficiency_games_dt[, afl_tables_game_id := get_game_id(home_team_code, away_team_code, as.Date(match_date))]\nworst_disposal_efficiency_games_dt[, afl_tables_url := get_game_afltables_url(afl_tables_game_id, season)]\n\nworst_disposal_efficiency_games_dt <- worst_disposal_efficiency_games_dt[,\n .(disposal_efficiency_game = sum(disposal_efficiency_percentage * disposals) / sum(disposals) / 100),\n c(\"afl_tables_game_id\", \"match_home_team\", \"match_away_team\", \"venue_name\", \"season\", \"match_round\", \"afl_tables_url\")\n ]\n\nsetorder(worst_disposal_efficiency_games_dt, disposal_efficiency_game)\nworst_disposal_efficiency_games_dt[, rank := seq_along(disposal_efficiency_game)]\nworst_disposal_efficiency_games_dt <- worst_disposal_efficiency_games_dt[rank %in% 1:5 | afl_tables_game_id == infamous_game_id,]\nworst_disposal_efficiency_games_dt <- worst_disposal_efficiency_games_dt[, .(rank, afl_tables_game_id, match_home_team, match_away_team, disposal_efficiency_game, season, match_round, venue_name, afl_tables_url)]\n\n# verify outputs match:\nidentical(as.data.table(worst_disposal_efficiency_games_tb), as.data.table(worst_disposal_efficiency_games_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n:::\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\n\n\nThe only game with less than 50% disposal efficiency was played in torrid conditions up in Cairns. Looking at the [video highlights](https://www.youtube.com/watch?v=VI4dCyLt82c){target=\"_blank\"}, the players were running through puddles the whole game. I have however seen equal or worse conditions in the past, so it is somewhat curious that this was the worst by such a margin. For reference, Gold Coast were very poor that year, coming second last but the Roos came in a respectable ninth position (only one win and percentage outside the top 8), this also points to them probably winning this game if has been played in more favourable conditions, but they still would have missed out on finals due to the mammoth percentage of the Cats that year. The \"cleanest\" player on the day was Jesse Joyce, whose 8 touches came at 75% efficiency (however it was a rather low sample size).\n\n\nFurther disjointed musings:\n\n* Seeing a game in 2018 is also a fun reminder of the last time the Eagles won a premiership, which feels a long way off given the current predicament the club is in, in spite of their recent win against the similarly languishing[^roos-optimism] Roos last Sunday.\n\n[^roos-optimism]: But far more optimistic due to a combination of Clarko (he returns from his hiatus this week) and promising talent on their list such as Harry Sheezel\n\n* In the 20^th^ century, the use of suburban grounds where the quality of the surface was subpar I am sure led to far more games with lower disposal efficiency than this (muddy fields were far more common in those days). \n\n* We can't entirely blame the players, given the conditions. Anyone who has kicked a footy around in the wet will know how much heavier and slipperier than usual it can get (it is often described as being like a bar of soap). By looking through old highlights packages of the top several games in this metric, all of them appear to have been impacted significantly by weather conditions.\n\n* The Fryzigg data actually has weather conditions as a field on it but it appears to be somewhat unreliable, when cross referencing the games with match reports and highlights, some of the \"sunny\" games turned out to be played in torrential rain.\n\n* I also checked which players have the highest career kicking efficiency and it appears to be mostly defenders who probably inflate their numbers by getting involved in switches of play and chipping the ball around the back line. So we have to take this kind of metric with a grain of salt, there is a certain difficulty level with executing certain types of disposal (e.g. a kick inside 50) that it does not fully capture.\n\n\n\n## Youngest Norm Smith Medalist\n\n\nOur question regarding Norm Smith medallists from above reads:\n\n> Who was the youngest player to win a Norm Smith Medal?\n\n\n::: {.callout-note}\n\n### Background on the Norm Smith Medal\n\n* The Norm Smith Medal is awarded to the player who is adjudged best on ground in the AFL grand final. \n\n\n* The award is named after legendary Melbourne full forward of the 1940's and coach of the 1950's and 1960's, [Norm Smith](https://afltables.com/afl/stats/players/N/Norm_Smith.html){target=\"_blank\"}. In his decorated career, he won a total of 10 premierships, 4 as a player, 6 as coach and all for the Melbourne football club (given Melbourne have only won 13 in their history, quite a feat). At the back end of his playing career, he spent two years as captain-coach (yes that was a thing at the time) of the Fitzroy football club (here they pop up again).\n\n* The Norm Smith medal is usually given to a player on the winning team but very occasionally, players have managed to win the award in a losing side, the last time being 4 out of 45 times and the last time was Eagles (and Carlton) superstar Chris Judd in 2005.\n\n* The Norm Smith Medal was first instituted in 1979. Prior to this, there was no official award given, however the media and fans of the day had their opinions of who the best on ground was in prior grand finals. While [this article](https://themongrelpunt.com/footy-history/2020/04/30/before-the-norm-smith-best-on-ground-prior-to-1979){target=\"_blank\"} lists some \"unofficial\" best on ground performances in grand finals prior to 1979, I will stick with the official list. As a Western Australian, I would have no qualms with discarding the older, exclusively Victorian seasons particularly as this data is of dubious reliability.\n\n\n:::\n\nAs previously mentioned, Norm Smith Medal data is not available on `fitzRoy`, so we will have to scrape it with some of our own bespoke code[^illustrates-fitzroy-point]. The AFL website conveniently has a [nice table](https://www.afl.com.au/stats/leaders-awards/norm-smith-medal\"){target=\"_blank\"}, listing all the winners since the award began in 1979. We will supplement this with data from `fitzRoy` to figure out the level of experience of each player.\n\n\n\n[^illustrates-fitzroy-point]: This illustrates the fact that sometimes you need to stray outside of `fitzRoy` but it gives most of the footy data you could ever want.\n\n::: {.callout-warning}\n\n### No `data.table` code\n\nWhile all the other code has thus far been written in [both tidyverse and `data.table`](#tidyverse-versus-data.table), I decided to leave it for this one as it was rather intricate and painful to perform the same process twice.\n\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnorm_smith_url <- \"https://www.afl.com.au/stats/leaders-awards/norm-smith-medal\"\nnorm_smith_html <- read_html(norm_smith_url)\n\nnorm_smith_html |> \n html_table(header = TRUE) |> \n _[[1]] |> \n mutate( #manually adjust to help mapping\n Club = case_when(\n Club == \"Geelong Cats\" ~ \"Geelong\",\n Club == \"West Coast Eagles\" ~ \"West Coast\",\n Club == \"Sydney Swans\" ~ \"Sydney\",\n TRUE ~ Club\n ),\n Player = case_when(\n Player == \"Billy Duckworth\" ~ \"Bill Duckworth\",\n Player == \"Ryan O'Keefe\" ~ \"Ryan OKeefe\",\n TRUE ~ Player \n ),\n Year = as.integer(str_sub(Year, 1, 4))\n ) -> norm_smith_tb\n\n# get the date of their first game\nplayer_stats_tb |> \n group_by(\n player_afltables_url, Playing.for\n ) |> \n summarise(\n first_game_date = min(Date),\n .groups = \"drop\"\n ) -> more_player_details\n\n# get games played to date\nplayer_stats_tb |> \n group_by(ID) |> \n arrange(Date) |> \n mutate(\n games_played_to_date = seq_along(ID)\n ) |> \n ungroup() -> \n player_stats_tb_games_played\n\n# get grand final information\nplayer_stats_tb_games_played |>\n filter(\n Round == \"GF\",\n Season >= 1979\n ) |>\n mutate(\n player_name = paste(First.name, Surname),\n on_winning_team = (Home.team == Playing.for & Home.score > Away.score) | (Away.team == Playing.for & Home.score < Away.score)\n ) |> \n select(ID, player_name, Playing.for, game_afltables_url, player_afltables_url, grand_final_season = Season, grand_final_date = Date, games_played_to_date, on_winning_team) ->\n grand_final_tb\n\ngrand_final_tb |> \n select(grand_final_date) |>\n distinct() |> \n arrange(desc(grand_final_date)) |> \n pull(grand_final_date) ->\n grand_final_dates\n \nplayer_details_tb |> \n mutate(\n first_year = str_sub(Seasons, 1, 4)\n ) -> \n player_details_tb_joinable\n\n\nnorm_smith_tb |> \n mutate(Date = grand_final_dates) |> \n left_join(\n grand_final_tb, by = c(\"Player\" = \"player_name\", \"Club\" = \"Playing.for\", \"Date\" = \"grand_final_date\") \n ) |> \n left_join(\n more_player_details, by = c(\"player_afltables_url\", \"Club\" = \"Playing.for\")\n ) |> \n mutate(\n first_year = format(first_game_date, \"%Y\")\n ) |> \n left_join(\n player_details_tb_joinable, by = c(\"Player\", \"Club\" = \"Team\", \"first_year\")\n ) |>\n mutate(\n debut_age_years = as.integer(str_sub(Debut, 1, 2)),\n debut_age_days = as.integer(str_remove_all(Debut, \".*y |d\")),\n date_of_birth = first_game_date - years(debut_age_years) - days(debut_age_days),\n age_at_grand_final = as.period(interval(start = date_of_birth, end = Date)),\n age_at_grand_final_seconds = as.period(interval(start = date_of_birth, end = Date))\n ) |>\n arrange(age_at_grand_final ) |>\n select(\n Player, Club, Year, on_winning_team, games_played_to_date, age_at_grand_final, age_at_grand_final_seconds, player_afltables_url, game_afltables_url) -> norm_smith_youngest_tb\n```\n:::\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\n\nThe youngest player on the list is the inaugural winner, Wayne Harmes, who was only 19 years of age when he won the prestigious award. He is known for a [legendary moment](https://www.youtube.com/watch?v=G6jDAtqEi50){target=\"_blank\"} during this match, where, towards the end of the fourth quarter, he ran down his own (errant) kick by sliding along the ground and tapping the ball to keep it in play, sending it into the path of his team mate Ken Sheldon who ran into the open goal. The goal ended up being a decisive one as the Blues came out as victors by only 5 points.\n\n\n\n```{=html}\n\n```\n\n\n\nThe most inexperienced in terms of games played was Maurice Rioli (father of Maurice Jr. who is currently plying his trade at his old man's club), member of the famous [Rioli family](https://en.wikipedia.org/wiki/Rioli_family){target=\"_blank\"} of the Tiwi Islands which has uncannily produced a plethora of great footballers and premiership players. He was the first (and arguably the greatest) Rioli to ever play in the AFL/VFL. One caveat is that while at the time he had only played 21 VFL games, he was 24 years of age and had previously played 6 years of WAFL footy[^wafl-quality] for South Fremantle, so he wasn't your typical 21 game player, he was really in his prime.\n\n\n[^wafl-quality]: At this time (prior to a national competition), WAFL football (as well as the SANFL in South Australia) could be viewed as being at a similar level to the VFL (although VFL did benefit from having the larger population in Victoria as a talent pool). While these days, the WAFL is the tier below the AFL (like the English Championship is the to Premier League), at that time, it could instead be viewed as a competition that was the best within its own region (like Serie A is to La Liga).\n\n\n\n# Epilogue\n\n\n## Closing Remarks\n\n\nTo mention one last record, the margin of 171 in the [infamous game](#a-concrete-example) is actually the equal 4^th^ highest winning margin in a AFL/VFL game. Interestingly, the all-time record in this category comes full circle, being a game between [the Fitzroy Lions and the Melbourne Demons back in 1979](https://afltables.com/afl/stats/games/1979/061119790728.html){target=\"_blank\"}. Fittingly, the victor of this game, was our friends (for whom we owe the greatest gratitude for helping us import data), the mighty `fitzRoy` footy club by a whopping 190 points. So while they had their trials and tribulations as a club (some of which we have covered in this post), it is nice to finish with them on a high note.\n\n\nInterestingly, the [aforementioned infamous game](#a-concrete-example) isn't the record-holder (or even in the top 5) for any of our [questions](#question-list)[^failure-to-break-records] but it is only one of 31 games where a team has scored 200 points or more which is notable enough I think, particularly given I had the (mis-)fortune of witnessing such a rare event in the flesh. Perhaps we could dig deeper to find a record it holds (every game is uniquely remarkable in some way if you look hard enough) but I somehow find more satisfaction in it being a thought-provoking enough game to coax these questions out of us without it ever being the *answer*.\n\n[^failure-to-break-records]: the closest it came was in the most multiple goal kickers category at equal 7^th^\n\n## Notable AFL Stats Figures\n\nI will conclude by listing some people who are doing interesting work with AFL stats (often with heavy use of R and the `fitzRoy` package) to provide further motivation:\n\n- [`fitzRoy`](https://github.com/jimmyday12/fitzRoy): as outlined in this article, this R package is the de facto way of sourcing AFL data.\n\n- [Useless AFL Stats](https://www.facebook.com/uselessaflstats){target=\"_blank\"}: a Facebook page which shares always interesting, sometimes abstract and often amusing AFL stats content. [Liam Crow](https://twitter.com/crow_data_sci){target=\"_blank\"} is their data scientist and posts some interesting content of his own on his website: [https://www.crowdatascience.com](https://www.crowdatascience.com){target=\"_blank\"}.\n\n- [squiggle.com.au](https://squiggle.com.au/leaderboard/){target=\"_blank\"}: displays a bunch of people's data-driven tipping models, many of which have websites and social media accounts where they do AFL stats.\n\n- [Jaiden Popowski](https://twitter.com/jaiden_popowski){target=\"_blank\"}: is prominent in the [AFL Fantasy](https://fantasy.afl.com.au/){target=\"_blank\"} community for the interesting data-driven analysis he produces.\n\n- [DFS Australia](https://dfsaustralia.com/afl-home/){target=\"_blank\"}: has some great data-driven tools that provide insight on advanced stats commonly used in [AFL Fantasy](https://fantasy.afl.com.au/){target=\"_blank\"}.\n\n\n# Comments\n", + "markdown": "---\ntitle: \"(Over-)Analysing Idle Footy Chat\"\ndescription: |\n This blog post discusses the types of questions one often posits while watching the footy (or indeed any sport). I will use this article as a medium through which I can introduce analysis of AFL data in R.\ndate: 2023-08-03\ncategories: \n - Sport\n - AFL\n - R\n - Data\nimage: fitzroy-lions-logo.png\nreference-location: margin\ndraft: false\nfreeze: true\nreading-timereading-time: true\n---\n\n::: {.cell}\n\n:::\n\n\n\n::: {.callout-caution}\n\n\n## Apologies for the delay\n\nI have been promising some people that this (my first) post will be \"coming soon\" for quite some time now. It has finally arrived and the main reasons for my slowness are:\n\n* I am a slow writer. I often have all the ideas in my head, but when it comes to putting them into nice, publishable words, it has a tendency to become a bit of a slog.\n\n* The scope of this project was overly ambitious and expanded as I kept on branching off from the main mission in order provide additional background (when given free reign, this is my tendency).\n\n\nThe R code and analysis itself did not take too long, so the main areas I am looking to improve in the future are my writing efficiency and keeping the scope of my blog posts under control. However, it would feel like going against my style to avoid going down rabbit holes and off on tangents entirely, so there is still a balance to be struck there.\n\nThe out-of-control scope has also impacted the length of this article. I commend anyone who actually reads the whole thing, but hopefully it's also possible to jump around and only read the parts that interest you in particular using the table of contents. I will endeavour to keep future blog posts a lot shorter going forward.\n\n:::\n\n\n::: {.callout-note}\n\n## Edits since the initial release\n\n> Thanks to everyone who has provided feedback and suggestions.\n\n* Fixed some grammatical, punctuation and spelling errors\n\n* Re-worded the [question concerning goal kickers](#most-goal-kickers) to include the words \"for a team in a single game\" instead of merely \"in a single game\". I decided to change the question to suit the answer, instead of the reverse (this is because changing the answers would mean materially modifying the content of the article, which I didn't want to do given the vast majority of the people who would ever read it, have done so already)\n\n:::\n\n# Prelude\n\nSince antiquity (or at the very least living memory), sporting data has been recorded, published and analysed for almost every professional sport known to man. In the modern day of analytics and social media it has reached a point where sports statistics are constantly being recorded and opined on by teams, the press and your average punter alike. \n\nMy beloved sport of *Australian Rules Football* (which I will henceforth refer to as \"footy\"[^footy]) is no different. Indeed some have even described the AFL (Australia's national competition) as \"the most data rich sport on Earth\"[^data-rich-sport-source], although I would suggest that certain American sports such as baseball (i.e. [Moneyball](https://www.youtube.com/watch?v=PlKDQqKh03Y){target=\"_blank\"}) have made far better use of their data.\n\n[^footy]: As a Western Australian this is how I define it (along with the majority of Australia) but I am aware that this term is reserved for rugby league in New South Wales and Queensland\n[^data-rich-sport-source]: [Source](https://www.youtube.com/watch?v=i_mePwh_02M){target=\"_blank\"}\n\n\nIt is my observation that it is common for those watching or attending live footy to ask questions such as:\n\n> What is the record for the most *[statistical category]*?\n\nor \n\n> When was the last time *[obscure event]* happened?\n\nI believe a contributor to this is that we are emulating what we hear on the broadcast commentary. The difference is that they often have a team of [computer-type boffins](https://www.youtube.com/clip/UgkxvNk03iqKigc9NKMjWOs2NvuRFRY8xRHn){target=\"_blank\"}[^video-link-disclaimer] behind the scenes feeding them the answer. \n\n[^video-link-disclaimer]: The video linked here is unfortunately clipped from an American guy who kind of missed the point a bit, but it is also the highest quality clip I could find of this brilliant piece of commentary from BT ([Brian Taylor](https://www.youtube.com/watch?v=E_JCdK4ah78){target=\"_blank\"} for the uninitiated).\n\n> But what are us plebeians meant to do, bereft of such luxuries?\n\nyou might ask...\n\nWell, a quick Google search will make short work of questions of a more trivial nature such as \"which player has kicked the most career goals\"[^tony-lockett] or \"which team has won the most premierships\"[^most-premierships]. The more savvy among us may find answers to slightly more edifying questions by performing pro gamer moves such as\n\n* Trawling through more obscure websites such as [AFL Tables](https://afltables.com/afl/afl_index.html){target=\"_blank\"}[^baseball-reference] to answer things like \"What is the most disposals Zac Dawson had in a game\"[^zac-dawson]; or\n\n* Digging into the deep recesses of the AFL Live App to answer questions like \"What is the record for the longest distance run in a game\"[^telstra-tracker]\n\nEven so, some more sophisticated questions will still go unanswered.\n\n[^baseball-reference]: the [baseball/basketball reference](https://www.sports-reference.com/){target=\"_blank\"} of the footy world (but maybe not quite as extensive)\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n\n\n```\n:::\n:::\n\n\n\n[^zac-dawson]: [19](https://afltables.com/afl/stats/players/Z/Zac_Dawson.html#sortableTable2){target=\"_blank\"}, [versus Melbourne in round 10 2009](https://afltables.com/afl/stats/games/2009/111520090530.html){target=\"_blank\"}\n\n[^telstra-tracker]: [Tom Scully, 18.9 KM]{id='distance-image'}\n\n[^tony-lockett]: [Tony Lockett on 1360](https://en.wikipedia.org/wiki/List_of_VFL/AFL_records#Goalkicking){target=\"_blank\"}\n\n[^most-premierships]: [Carlton and Essendon tied on 16](https://en.wikipedia.org/wiki/List_of_VFL/AFL_records#Premierships){target=\"_blank\"}\n\nHowever, if you are a [gadget-type operator](https://www.youtube.com/clip/UgkxtEJxG9BEEvFMmGfpjSBWc4MtWPKwNTLW){target=\"_blank\"}[^tipping-name] like myself, you will expand the number of footy stats questions you can answer immensely by accessing and manipulating the raw data yourself. There are, of course, a multitude of tools and approaches to this, but in this post, I will be using R (my preferred programming language)\n\n[^tipping-name]: My footy tipping username is *Gadget-type Operator* and I often use [other BT quotes](https://www.youtube.com/watch?v=E_JCdK4ah78){target=\"_blank\"} for my username on other (even non-footy-related) accounts\n\n\n## Target Audience Unclear\n\nIf you are a footy fan this article will likely present as some mildy interesting footy facts, combined with incomprehensible techno-babble. For R users this will likely appear to be a fairly elementary data wrangling exercise, combined with a bunch of references you don't understand. But if you are both a footy fan and an R user, it will hopefully prove to be a quite interesting read. \n\n# A concrete example\n\nI recently had the misfortune ([as a West Coast fan](https://www.instagram.com/p/CiRdBD9A4lU/){target=\"_blank\"}) of attending a game live between the Sydney Swans and the (not so) mighty West Coast Eagles with the following score line (the equal fourth highest margin of all time):\n\n![](swans-vs-eagles-score.png \"How embarrassing!\")\n\n::: {.column-margin}\n![Our respective reactions really tell it all](photo-at-game.jpg)\n:::\n\nIn one-sided games like this, it seems to me that footy stats questions become more common than usual for two key reasons:\n\n1. It adds something interesting to a game that otherwise lacks excitement\n\n2. These games are often filled with large statistical anomalies that might set new records\n\n\nAs my usual footy-going companion (Saroop) and I are both actuaries by trade, footy stats questions were flying left, right, and centre on that gloomy (though not due to the weather) Saturday night at the SCG. The questions we posed did not just vanish into the aether either, I ,with the idea of writing this article in mind, decided to record the more interesting (and doable) questions.\n\n## Question List\n\nThis article aims to tackle the questions listed below:\n\n> \n- [What is the record for the highest scoring quarter?](#highest-scoring-quarter)\n- [What is the record for the most individual goal kickers for a team in a single game?](#individual)\n- [What is the record for the most multiple goal kickers for a team in a single game?](#multiple)\n- [What is the record for the most players kicking five or more goals for a team in a single game (i.e. the most \"bags\")?](#five-or-more-bags)\n- [What is the record for the most clangers in a game?](#most-clangers)\n- [What is the record for the worst disposal efficiency in a game?](#worst-disposal-efficiency)\n- [Who was the youngest player to win a Norm Smith Medal?](#youngest-norm-smith-medalist)\n- What is the record for the most unanswered goals in a game?*\n- What is the school with the most AFL players on their list?*\n- What city/town has the most AFL players relative to population?*\n- Which player has the best goals to behinds ratio?*\n- Have there ever been any undefeated seasons?*\n- What is the worst win-loss record to make finals?*\n\n\n::: {.callout-note}\n\n## Disclaimer on the asterisk\n\nI have put an asterisk next to questions that either: \n\n* cannot easily be answered by the methods I discuss below; or\n* are too lengthy for this blog post (these may get their own dedicated blog post in the future). \n\nThe remaining questions (which have hyperlinks to later sections of this article) will be tackled using R below.\n\n:::\n\n\nNote that things are about to get very technical so if you are only really interested in the answers (and not the R coding), you can jump ahead by [clicking here](#figuring-out-the-answers).\n\n\n\n\n# Technical Background\n\n## The `fitzRoy` Package\n\nThe first step in analysing AFL data is obtaining the data (the so-called \"collection\" phase). Our first thought might be to search the web for publicly available AFL datasets and APIs or even scrape the data from websites such as the [official AFL website](https://www.afl.com.au/){target=\"_blank\"}, [Footywire](https://www.footywire.com/){target=\"_blank\"} or [AFL Tables](https://afltables.com/afl/afl_index.html){target=\"_blank\"}. But there is a more straight-forward way.\n\n\nWhile most people now know Fitzroy as a trendy inner city suburb of Melbourne, filled with terraces and [over-priced croissants](https://www.lunecroissanterie.com/){target=\"_blank\"}, it was once home[^bith-place] to the mighty ([and now merged out of existence](https://en.wikipedia.org/wiki/Brisbane_Lions#Brisbane_Bears_absorb_Fitzroy_Football_Club's_AFL_operations,_become_Brisbane_Lions){target=\"_blank\"}) Fitzroy Lions Football Club.\n \n[![](fitzroy-lions-logo.png \"La Marseillaise!\")](https://www.youtube.com/watch?v=olKa-0H26k4){target=\"_blank\"}\n\n[^bith-place]: It is also (regrettably) the place of my birth but as a WA boy I don't like to talk about the fact that my parents happened to be in Melbourne when I was born.\n\n\nWe can obtain the data we need very simply using its name-sake, the [`fitzRoy` R package](https://jimmyday12.github.io/fitzRoy/){target=\"_blank\"}. This package abstracts away all the web scraping and API calls for us into a very helpful family of `fetch_*` functions.\n\nSo let's begin by loading the `fitzRoy` package and while we're at it, we will also load all the other packages we will be using later on.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(fitzRoy)\n\n# Note that I generally avoid mixing dplyr and data.table at the same time\n# but the reason I have done this with will become apparent later\nlibrary(dplyr)\nlibrary(data.table)\n\nlibrary(rvest)\nlibrary(stringr)\nlibrary(tidyr)\nlibrary(purrr)\nlibrary(lubridate)\n```\n:::\n\n\n\n\n### `fitzRoy` Data Sources\n\n`fitzRoy` provides access to a number of footy data sources[^FitzRoy] including [AFL Tables](https://afltables.com/){target=\"_blank\"} and the [official AFL website](https://www.afl.com.au/fixture){target=\"_blank\"}. Each data source has its own advantages and disadvantages, for example:\n\n* [AFL Tables](https://afltables.com/){target=\"_blank\"} has the entirety of AFL/VFL history (1897 to present) but lacks some of the more advanced stats.\n\n* The [official AFL website](https://www.afl.com.au/fixture){target=\"_blank\"} only has data from 2014 onwards but it also probably the most complete in terms of the advanced statistics it contains (e.g. centre bounce attendances[^CBAs]).\n\n[^FitzRoy]: Up-to-date information on data sources can be found on `fitzRoy`'s [documentation site](https://jimmyday12.github.io/fitzRoy/articles/fitzRoy.html#data-sources){target=\"_blank\"}\n\n[^CBAs]: Centre bounce attendances (CBAs) are a commonly-used metric in AFL Fantasy, fantasy \"coaches\" often look at tools such as [this one](https://dfsaustralia.com/afl-cbas/){target=\"_blank\"} to help with researching their trades.\n\nAll of the different data sources are compared in the table below:\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\n\nNote that each row of the table can be expanded to reveal what data is available from each source, as well at its use-case. In addition to the sources listed in this table, the following functions only come from one source:\n\n* [`fetch_betting_odds_footywire()`](https://jimmyday12.github.io/fitzRoy/reference/fetch_betting_odds_footywire.html){target=\"_blank\"}\n\n* [`fetch_squiggle_data()`](https://jimmyday12.github.io/fitzRoy/reference/fetch_squiggle_data.html){target=\"_blank\"}\n\n* [`fetch_coaches_votes()`](https://jimmyday12.github.io/fitzRoy/reference/fetch_coaches_votes.html){target=\"_blank\"}\n\n## Importing the Data\n\nFor the purposes of answering the questions [above](#question-list), I am most interested in the full history of the AFL and have decided to use AFL Tables as my primary data source[^data-regret]. I will also use Fryzigg for one small use-case where AFL Tables is missing key data (disposal efficiency) and some bespoke web scraping for Norm Smith Medallists.\n\n[^data-regret]: In hindsight I somewhat regret this decision and would have probably preferred to use Fryzigg for everything (with the exception of quarter scores which it doesn't have and AFL Tables does) but I only realised it had the full AFL/VFL history when I constructed the table comparing data sources above\n\n\nThe `fetch_*` family of functions from the `fitzRoy` package allow us to read data from the various sources. Consult the [documentation site](https://jimmyday12.github.io/fitzRoy/reference/index.html){target=\"_blank\"} for a complete list of all the available functions.\n\n\nWe can *fetch* this data via `fitzRoy` with the following code:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_stats <- fetch_player_stats_afltables(season = 1897:2023)\nresults <- fetch_results_afltables(season = 1897:2023)\nplayer_details <- fetch_player_details_afltables()\nplayer_stats_fryzigg <- fetch_player_stats_fryzigg(season = 1897:2023)\n```\n:::\n\n\n\n\n::: {.callout-warning}\n\n#### Being a good citizen\n\nWhen sourcing data from `fitzRoy`, it is important to follow good data collection[^fitzRoy-good-practice] etiquette by only downloading the data you need and avoiding repeatedly downloading the same data over and over again. This prevents servers being overloaded and will mean everyone will get their data faster.\n\nIn keeping with this, for the purposes of this blog post, I have saved the data in a local RDS file. That way, I can simply use `readRDS()` instead of of repeatedly calling the `fetch_*` functions. The code for this is below (and the code above is not actually run but is cleaner for demonstration purposes):\n\n[^fitzRoy-good-practice]: this topic is discussed on the `fitzRoy` documentation site [here](https://jimmyday12.github.io/fitzRoy/articles/fitzRoy.html#good-practices){target=\"_blank\"}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nif(file.exists(\"data/player_stats.RDS\")) {\n player_stats <- readRDS(\"data/player_stats.RDS\")\n \n} else {\n player_stats <- fitzRoy::fetch_player_stats_afltables(season = 1897:2023)\n saveRDS(player_stats, \"data/player_stats.RDS\")\n}\n\nif(file.exists(\"data/results.RDS\")) {\n results <- readRDS(\"data/results.RDS\")\n \n} else {\n results <- fitzRoy::fetch_results_afltables(season = 1897:2023)\n saveRDS(results, \"data/results.RDS\")\n}\n\nif(file.exists(\"data/player_details.RDS\")) {\n player_details <- readRDS(\"data/player_details.RDS\")\n \n} else {\n player_details <- fetch_player_details_afltables()\n saveRDS(player_details, \"data/player_details.RDS\")\n}\n\nif(file.exists(\"data/player_stats_fryzigg.RDS\")) {\n player_stats_fryzigg <- readRDS(\"data/player_stats_fryzigg.RDS\")\n \n} else {\n player_stats_fryzigg <- fetch_player_stats_fryzigg(season = 1897:2023)\n saveRDS(player_stats_fryzigg, \"data/player_stats_fryzigg.RDS\")\n}\n```\n:::\n\n\n:::\n\nThe data we have read in is as at round 19 of the 2023 AFL season.\n\n## Finicky Details About Other R Packages\n\n### Tidyverse Versus `data.table`\n\nIn the R community, there is an [ongoing power struggle](https://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly){target=\"_blank\"} between using the Posit[^previously-RStudio]-backed [tidyverse](https://www.tidyverse.org/){target=\"_blank\"} and the heavily-optimised [`data.table`](https://rdatatable.gitlab.io/data.table/){target=\"_blank\"}.\n\n[^previously-RStudio]: [formerly known as RStudio](https://posit.co/blog/rstudio-is-becoming-posit/){target=\"_blank\"} ([RIP](https://www.youtube.com/watch?v=TtMzTGfs-fc){target=\"_blank\"})\n\nAs to not unsettle people who prefer either `dplyr` (and the tidyverse) or `data.table`, I have written code in both packages[^base-r-dig]. Where relevant, I have used a tabbed layout for the convenience of the reader. As my personal preference for readability purposes is the tidyverse[^tidyverse-rationale], I will place this code in the first tab.\n\n\n::: {.callout-important}\n\n#### A cautionary tale\n\nWhile doing things in this way did scratch something of a perfectionist's itch in me and was a fun learning experience, I will probably refrain from doing something like this again in future posts. I don't think the additional time it took me to essentially write the same code twice is worth the effort.\n\n:::\n\n[^base-r-dig]: Note that I have not written a `base` R dataframes version because I can see arguments for using both tidyverse and `data.table` but `base` R `data.frame`s will probably cause more pain than they are worth (there is a reason that tidyverse and `data.table` exist)\n\n\n[^tidyverse-rationale]: I will typically will only use `data.table` if the size of data necessitates it. In this case, the data is less than a million rows so there are no problems.\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\nNote that the code below is somewhat redundant as the `fitzRoy` package follows the [tidyverse philosophy](https://jimmyday12.github.io/fitzRoy/CONTRIBUTING.html){target=\"_blank\"} and returns [tibbles](https://tibble.tidyverse.org/){target=\"_blank\"}. I have used the `_tb` suffix[^tb-abbreviation] to distinguish `tibble`/`dplyr`/tidyverse from the `data.table` code.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_stats_tb <- as_tibble(player_stats)\nresults_tb <- as_tibble(results)\nplayer_details_tb <- as_tibble(player_details)\nplayer_stats_fryzigg_tb <- as_tibble(player_stats_fryzigg)\n```\n:::\n\n\n[^tb-abbreviation]: an abbreviation of \"tibble\"\n\n\n#### `data.table`\n\nHenceforth, all `data.table` code will use the `_dt` suffix[^dt-abbreviation] as to distinguish it from the tidyverse code.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_stats_dt <- as.data.table(player_stats)\nresults_dt <- as.data.table(results)\nplayer_details_dt <- as.data.table(player_details)\nplayer_stats_fryzigg_dt <- as.data.table(player_stats_fryzigg)\n```\n:::\n\n\n[^dt-abbreviation]: an acronym of \"data.table\"\n\n:::\n\n\n#### Adoption of the Native Pipe Operator (`|>`)\n\nThe so-called *pipe operator* (`%>%`) of the [`magrittr`](https://magrittr.tidyverse.org/){target=\"_blank\"} package has been a core staple of tidyverse since its inception, but since the R core team introduced the so-called *native pipe* (`|>`) to `base` R (in version [4.1](https://www.jumpingrivers.com/blog/new-features-r410-pipe-anonymous-functions/){target=\"_blank\"}[^function-shorthand]), this has led to a split in adoption. There are some nuances in its usage[^future-pipe-post] but it overall behaves in a similar way to the `magrittr` pipe and has [less overhead (and is therefore faster)](https://stackoverflow.com/questions/67633022/what-are-the-differences-between-rs-new-native-pipe-and-the-magrittr-pipe){target=\"_blank\"}. While the native pipe was initially missing some of the key features of the `magrittr` pipe, new features[^pipe-features] have been added to it that (in my mind) mean that it might have even surpassed the `magrittr` pipe.\n\n\nWhile I have tried to appease people in both the tidyverse and `data.table` camps, I will not be re-writing my code more than once with such as minor syntactic difference as the pipe I use. I will therefore be dragging all my tidyverse-using readers kicking and screaming into the R 4.1 world by adopting the native pipe (`|>`) in my tidyverse code.\n\nNote that the common RStudio shortcut, `Ctrl+Shift+M` can be changed from the `magrittr` pipe (`%>%`), which is still the default, to the native pipe (`|>`).\n\n[^function-shorthand]: another cool thing introduced in this version of R was so-called function shorthand (`\\()`), see `help(\"function\")` for more details\n\n[^future-pipe-post]: I may even cover these in a future blog post\n\n[^pipe-features]: In R version 4.2, the `_` symbol was added as a placeholder character and in R version 4.3, extractions using the `$` symbol are now allowed\n\n\n\n### Webscraping package\n\nWhile the majority of our data will be sourced using the `fitzRoy` package, a small amount of data (namely Norm Smith medalists, which are outside of the scope of `fitzRoy`) will require us to perform some bespoke web scraping. This will be performed using the `rvest` package (loaded [above](#cb1)).\n\n\n\n# Preliminary Data Wrangling\n\n\n## Flattening the Data\n\n\nTo begin with, let's scrutinise the results data in order to figure out what we have to work with.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr(results)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ntibble [16,352 × 16] (S3: tbl_df/tbl/data.frame)\n $ Game : num [1:16352] 1 2 3 4 5 6 7 8 9 10 ...\n $ Date : Date[1:16352], format: \"1897-05-08\" \"1897-05-08\" ...\n $ Round : chr [1:16352] \"R1\" \"R1\" \"R1\" \"R1\" ...\n $ Home.Team : chr [1:16352] \"Fitzroy\" \"Collingwood\" \"Geelong\" \"Sydney\" ...\n $ Home.Goals : int [1:16352] 6 5 3 3 6 4 3 9 6 5 ...\n $ Home.Behinds: int [1:16352] 13 11 6 9 4 6 8 10 5 9 ...\n $ Home.Points : int [1:16352] 49 41 24 27 40 30 26 64 41 39 ...\n $ Away.Team : chr [1:16352] \"Carlton\" \"St Kilda\" \"Essendon\" \"Melbourne\" ...\n $ Away.Goals : int [1:16352] 2 2 7 6 5 8 10 3 5 7 ...\n $ Away.Behinds: int [1:16352] 4 4 5 8 6 2 6 1 7 8 ...\n $ Away.Points : int [1:16352] 16 16 47 44 36 50 66 19 37 50 ...\n $ Venue : chr [1:16352] \"Brunswick St\" \"Victoria Park\" \"Corio Oval\" \"Lake Oval\" ...\n $ Margin : int [1:16352] 33 25 -23 -17 4 -20 -40 45 4 -11 ...\n $ Season : num [1:16352] 1897 1897 1897 1897 1897 ...\n $ Round.Type : chr [1:16352] \"Regular\" \"Regular\" \"Regular\" \"Regular\" ...\n $ Round.Number: int [1:16352] 1 1 1 1 2 2 2 2 3 3 ...\n```\n:::\n:::\n\n\n\nWhile inspecting the `results` we may note that certain key match-level information (e.g. quarter-by-quarter scores) for answering some of our [questions](#question-list) is missing from it. As it turns out, this data is actually available on the `player_stats_afl_tables` data (one row per player per match) instead. Thus, we will opt to create a 'flattened' version of `player_stats_afl_tables` with all the match-level fields available to use on both datasets and discard the `results` dataset (save for some quick checks to make sure the player data 'flattening' worked as expected).\n\n\nNow, let's take a look at the `player_stats_afl_tables` dataset to determine which fields are player-level and which are match-level.\n\n::: {.panel-tabset}\n\n### Code\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nstr(player_stats)\n```\n:::\n\n\nNote that the output has been placed into another tab as it is rather long.\n\n### Output\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n```\ntibble [663,115 × 59] (S3: tbl_df/tbl/data.frame)\n $ Season : num [1:663115] 1897 1897 1897 1897 1897 ...\n $ Round : chr [1:663115] \"1\" \"1\" \"1\" \"1\" ...\n $ Date : Date[1:663115], format: \"1897-05-08\" \"1897-05-08\" ...\n $ Local.start.time : int [1:663115] 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 ...\n $ Venue : chr [1:663115] \"Brunswick St\" \"Brunswick St\" \"Brunswick St\" \"Brunswick St\" ...\n $ Attendance : num [1:663115] 3000 3000 3000 3000 3000 3000 3000 3000 3000 3000 ...\n $ Home.team : chr [1:663115] \"Fitzroy\" \"Fitzroy\" \"Fitzroy\" \"Fitzroy\" ...\n $ HQ1G : int [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ HQ1B : int [1:663115] 5 5 5 5 5 5 5 5 5 5 ...\n $ HQ2G : int [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ HQ2B : int [1:663115] 11 11 11 11 11 11 11 11 11 11 ...\n $ HQ3G : int [1:663115] 5 5 5 5 5 5 5 5 5 5 ...\n $ HQ3B : int [1:663115] 13 13 13 13 13 13 13 13 13 13 ...\n $ HQ4G : int [1:663115] 6 6 6 6 6 6 6 6 6 6 ...\n $ HQ4B : int [1:663115] 13 13 13 13 13 13 13 13 13 13 ...\n $ Home.score : int [1:663115] 49 49 49 49 49 49 49 49 49 49 ...\n $ Away.team : chr [1:663115] \"Carlton\" \"Carlton\" \"Carlton\" \"Carlton\" ...\n $ AQ1G : int [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ AQ1B : int [1:663115] 3 3 3 3 3 3 3 3 3 3 ...\n $ AQ2G : int [1:663115] 1 1 1 1 1 1 1 1 1 1 ...\n $ AQ2B : int [1:663115] 3 3 3 3 3 3 3 3 3 3 ...\n $ AQ3G : int [1:663115] 2 2 2 2 2 2 2 2 2 2 ...\n $ AQ3B : int [1:663115] 3 3 3 3 3 3 3 3 3 3 ...\n $ AQ4G : int [1:663115] 2 2 2 2 2 2 2 2 2 2 ...\n $ AQ4B : int [1:663115] 4 4 4 4 4 4 4 4 4 4 ...\n $ Away.score : int [1:663115] 16 16 16 16 16 16 16 16 16 16 ...\n $ First.name : chr [1:663115] \"Bill\" \"Jimmy\" \"Bob\" \"Tom\" ...\n $ Surname : chr [1:663115] \"Ahern\" \"Aitken\" \"Armstrong\" \"Blake\" ...\n $ ID : num [1:663115] 4415 4416 4417 4419 4421 ...\n $ Jumper.No. : chr [1:663115] \"0\" \"0\" \"0\" \"0\" ...\n $ Playing.for : chr [1:663115] \"Carlton\" \"Carlton\" \"Carlton\" \"Carlton\" ...\n $ Kicks : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Marks : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Handballs : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Goals : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Behinds : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Hit.Outs : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Tackles : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Rebounds : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Inside.50s : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Clearances : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Clangers : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Frees.For : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Frees.Against : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Brownlow.Votes : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Contested.Possessions : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Uncontested.Possessions: num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Contested.Marks : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Marks.Inside.50 : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ One.Percenters : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Bounces : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Goal.Assists : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Time.on.Ground.. : num [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Substitute : int [1:663115] 0 0 0 0 0 0 0 0 0 0 ...\n $ Umpire.1 : chr [1:663115] \"Samuel Hood\" \"Samuel Hood\" \"Samuel Hood\" \"Samuel Hood\" ...\n $ Umpire.2 : chr [1:663115] \"\" \"\" \"\" \"\" ...\n $ Umpire.3 : chr [1:663115] \"\" \"\" \"\" \"\" ...\n $ Umpire.4 : chr [1:663115] \"\" \"\" \"\" \"\" ...\n $ group_id : int [1:663115] 2 2 2 2 2 2 2 2 2 2 ...\n```\n:::\n:::\n\n\n:::\n\n\nInspecting the fields and using some knowledge of the game, we can determine that the following fields are player-level:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_level_fields <- c(\n \"First.name\", \"Surname\", \"ID\", \"Jumper.No.\", \"Playing.for\", \"Kicks\", \"Marks\", \n \"Handballs\", \"Goals\", \"Behinds\", \"Hit.Outs\", \"Tackles\", \"Rebounds\", \"Inside.50s\", \n \"Clearances\", \"Clangers\", \"Frees.For\", \"Frees.Against\", \"Brownlow.Votes\", \n \"Contested.Possessions\", \"Uncontested.Possessions\", \"Contested.Marks\", \n \"Marks.Inside.50\", \"One.Percenters\", \"Bounces\", \"Goal.Assists\", \"Time.on.Ground..\",\n \"Substitute\"\n )\n\nmatch_level_fields <- setdiff(colnames(player_stats), player_level_fields)\n```\n:::\n\n\nWe can now safely group and aggregate by the `match_level_fields` below:\n\n\n::: {.panel-tabset}\n\n### Tidyverse\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_stats_tb |> \n mutate(\n home_player = Playing.for == Home.team,\n away_player = Playing.for == Away.team\n ) |> \n group_by(pick(all_of(match_level_fields))) |> \n summarise(\n player_count = n(),\n home_kicks = sum(Kicks * home_player),\n away_kicks = sum(Kicks * away_player),\n home_marks = sum(Marks * home_player),\n away_marks = sum(Marks * away_player),\n home_handballs = sum(Handballs * home_player),\n away_handballs = sum(Handballs * away_player),\n home_hit_outs = sum(Hit.Outs * home_player),\n away_hit_outs = sum(Hit.Outs * away_player),\n home_tackles = sum(Tackles * home_player),\n away_tackles = sum(Tackles * away_player),\n home_rebounds = sum(Rebounds * home_player),\n away_rebounds = sum(Rebounds * away_player),\n home_inside_50s = sum(Inside.50s * home_player),\n away_inside_50s = sum(Inside.50s * away_player),\n home_clearances = sum(Clearances * home_player),\n away_clearances = sum(Clearances * away_player),\n home_clangers = sum(Clangers * home_player),\n away_clangers = sum(Clangers * away_player),\n home_frees_for = sum(Frees.For * home_player),\n away_frees_for = sum(Frees.For * away_player),\n home_frees_against = sum(Frees.Against * home_player),\n away_frees_against = sum(Frees.Against * away_player),\n home_contested_possessions = sum(Contested.Possessions * home_player),\n away_contested_possessions = sum(Contested.Possessions * away_player),\n home_uncontested_possessions = sum(Uncontested.Possessions * home_player),\n away_uncontested_possessions = sum(Uncontested.Possessions * away_player),\n home_contested_marks = sum(Contested.Marks * home_player),\n away_contested_marks = sum(Contested.Marks * away_player),\n home_marks_inside_50 = sum(Marks.Inside.50 * home_player),\n away_marks_inside_50 = sum(Marks.Inside.50 * away_player),\n home_one_percenters = sum(One.Percenters * home_player),\n away_one_percenters = sum(One.Percenters * away_player),\n home_bounces = sum(Bounces * home_player),\n away_bounces = sum(Bounces * away_player),\n home_goal_assists = sum(Goal.Assists * home_player),\n away_goal_assists = sum(Goal.Assists * away_player),\n .groups = \"drop\"\n ) |>\n arrange(Date, Local.start.time, Home.team) -> \n match_stats_flat_tb\n\n# verify correct number of games:\nnrow(match_stats_flat_tb) == nrow(results_tb)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n### `data.table`\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmatch_stats_flat_dt <- copy(player_stats_dt)\n\nmatch_stats_flat_dt[, home_player := Playing.for == Home.team]\nmatch_stats_flat_dt[, away_player := Playing.for == Away.team]\n\nmatch_stats_flat_dt <- match_stats_flat_dt[, .(\n player_count = .N,\n home_kicks = sum(Kicks * home_player),\n away_kicks = sum(Kicks * away_player),\n home_marks = sum(Marks * home_player),\n away_marks = sum(Marks * away_player),\n home_handballs = sum(Handballs * home_player),\n away_handballs = sum(Handballs * away_player),\n home_hit_outs = sum(Hit.Outs * home_player),\n away_hit_outs = sum(Hit.Outs * away_player),\n home_tackles = sum(Tackles * home_player),\n away_tackles = sum(Tackles * away_player),\n home_rebounds = sum(Rebounds * home_player),\n away_rebounds = sum(Rebounds * away_player),\n home_inside_50s = sum(Inside.50s * home_player),\n away_inside_50s = sum(Inside.50s * away_player),\n home_clearances = sum(Clearances * home_player),\n away_clearances = sum(Clearances * away_player),\n home_clangers = sum(Clangers * home_player),\n away_clangers = sum(Clangers * away_player),\n home_frees_for = sum(Frees.For * home_player),\n away_frees_for = sum(Frees.For * away_player),\n home_frees_against = sum(Frees.Against * home_player),\n away_frees_against = sum(Frees.Against * away_player),\n home_contested_possessions = sum(Contested.Possessions * home_player),\n away_contested_possessions = sum(Contested.Possessions * away_player),\n home_uncontested_possessions = sum(Uncontested.Possessions * home_player),\n away_uncontested_possessions = sum(Uncontested.Possessions * away_player),\n home_contested_marks = sum(Contested.Marks * home_player),\n away_contested_marks = sum(Contested.Marks * away_player),\n home_marks_inside_50 = sum(Marks.Inside.50 * home_player),\n away_marks_inside_50 = sum(Marks.Inside.50 * away_player),\n home_one_percenters = sum(One.Percenters * home_player),\n away_one_percenters = sum(One.Percenters * away_player),\n home_bounces = sum(Bounces * home_player),\n away_bounces = sum(Bounces * away_player),\n home_goal_assists = sum(Goal.Assists * home_player),\n away_goal_assists = sum(Goal.Assists * away_player)\n), by = match_level_fields]\n\nsetorder(match_stats_flat_dt, Date, Local.start.time, Home.team)\n\n# verify outputs match:\nidentical(as.data.frame(match_stats_flat_tb), as.data.frame(match_stats_flat_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n:::\n\nHenceforth, `player_stats_*` and `match_stats_flat_*` will be the two datasets we will use predominantly.\n\n## IDs and URLs\n\nOne thing that our `match_stats_flat_*` dataset is currently lacking is a game ID for use as a primary key. In addition, being able to link directly to AFL tables when talking about a particular game or player would be handy.\n\n### Game ID and URL\n\nLet's tackle the game ID by writing some functions to an ID which also conveniently lines up with the way AFL Tables game URLs work (two birds with one stone). \n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nteam_code_map <- c(\n \"Adelaide\" = \"01\",\n \"Adelaide Crows\" = \"01\",\n \"Brisbane Bears\" = \"02\",\n \"Carlton\" = \"03\",\n \"Collingwood\" = \"04\",\n \"Essendon\" = \"05\",\n \"Fitzroy\" = \"06\",\n \"Western Bulldogs\" = \"07\",\n \"Fremantle\" = \"08\",\n \"Geelong\" = \"09\",\n \"Geelong Cats\" = \"09\",\n \"Hawthorn\" = \"10\",\n \"Melbourne\" = \"11\",\n \"North Melbourne\" = \"12\",\n \"Port Adelaide\" = \"13\",\n \"Richmond\" = \"14\",\n \"St Kilda\" = \"15\",\n \"Sydney\" = \"16\",\n \"Sydney Swans\" = \"16\",\n \"University\" = \"17\",\n \"West Coast\" = \"18\",\n \"West Coast Eagles\" = \"18\",\n \"Brisbane Lions\" = \"19\",\n \"Gold Coast\" = \"20\",\n \"Gold Coast Suns\" = \"20\",\n \"Greater Western Sydney\" = \"21\",\n \"GWS Giants\" = \"21\"\n)\n\n# The three functions below are all vectorised for efficiency purposes\nget_team_code <- function(team_name) {\n unname(team_code_map[team_name])\n}\n\nget_game_id <- function(home_team_code, away_team_code, game_date) {\n # example ID: 161820230624\n game_date_string <- format(game_date, \"%Y%m%d\")\n \n ifelse(\n home_team_code > away_team_code, \n # the smaller code is always first\n paste0(away_team_code, home_team_code, game_date_string),\n paste0(home_team_code, away_team_code, game_date_string)\n )\n}\n\nget_game_afltables_url <- function(game_id, season) {\n # example url: https://afltables.com/afl/stats/games/2023/161820230624.html\n paste0(\"https://afltables.com/afl/stats/games/\", season,\"/\", game_id, \".html\")\n}\n```\n:::\n\n\nNow let's use these functions[^vectorisation-benefits] to add a primary key to our `match_stats_flat_*` datasets.\n\n[^vectorisation-benefits]: Note that as the functions are vectorised, we need not use the slow `purrr::map*()` or `*apply()` family of functions to apply them to a column of our `tibble` and `data.table` respectively.\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmatch_stats_flat_tb |> \n mutate(\n home_team_code = get_team_code(Home.team),\n away_team_code = get_team_code(Away.team),\n game_id = get_game_id(home_team_code, away_team_code, Date),\n game_afltables_url = get_game_afltables_url(game_id, Season)\n ) |> \n relocate(game_id, .before = Season) |> \n arrange(game_id) ->\n match_stats_flat_tb\n```\n:::\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmatch_stats_flat_dt[, home_team_code := get_team_code(Home.team)]\nmatch_stats_flat_dt[, away_team_code := get_team_code(Away.team)]\nmatch_stats_flat_dt[, game_id := get_game_id(home_team_code, away_team_code, Date)]\nmatch_stats_flat_dt[, game_afltables_url := get_game_afltables_url(game_id, Season)]\n\nsetcolorder(match_stats_flat_dt, c(\"game_id\", setdiff(names(match_stats_flat_dt), \"game_id\")))\nsetkey(match_stats_flat_dt, game_id)\n\n# verify outputs match:\nidentical(as.data.frame(match_stats_flat_tb), as.data.frame(match_stats_flat_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n:::\n\n### Player URLs\n\nIn a similar way we can add a player URL to our `player_stats_*` datasets, we start by creating a mapping table.\n\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# non-duplicate URL: https://afltables.com/afl/stats/players/E/Errol_Gulden.html\n# duplicate URL: https://afltables.com/afl/stats/players/J/Josh_Kennedy0.html, https://afltables.com/afl/stats/players/J/Josh_Kennedy1.html\n# for dealing with duplicates, for example Peter Brown (6 of the same name!) seems to have a nonsensical order\nplayer_stats_tb |> \n mutate(full_name = paste(First.name, Surname, sep = \"_\")) |> \n distinct(ID, full_name) |> \n group_by(full_name) |> \n arrange(ID) |>\n mutate(\n instance_number = as.character(cumsum(rep(1L, n())) - 1L),\n dup_count = n()\n ) |> \n mutate(\n number_suffix = if_else(dup_count == 1L, \"\", instance_number),\n first_letter = str_sub(full_name, 1, 1),\n player_afltables_url = paste0(\"https://afltables.com/afl/stats/players/\", \n first_letter, \"/\", full_name, number_suffix, \".html\")\n ) |> \n ungroup() |> \n select(ID, player_afltables_url) ->\n player_url_tb\n```\n:::\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_url_dt <- copy(player_stats_dt)\n\nplayer_url_dt[, full_name := paste(First.name, Surname, sep = \"_\")]\nplayer_url_dt <- unique(player_url_dt, by = c(\"ID\", \"full_name\"))\nsetorder(player_url_dt, ID)\nplayer_url_dt <- player_url_dt[, `:=`(\n instance_number = as.character(cumsum(rep(1L, .N)) - 1L),\n dup_count = .N\n), \"full_name\"]\n\nplayer_url_dt[, number_suffix := fifelse((dup_count == 1L), \"\", instance_number)]\nplayer_url_dt[, first_letter := str_sub(full_name, 1, 1)]\nplayer_url_dt[, player_afltables_url := paste0(\"https://afltables.com/afl/stats/players/\", \n first_letter, \"/\", full_name, number_suffix, \".html\")]\nplayer_url_dt <- player_url_dt[, .(ID, player_afltables_url)]\n\n# verify outputs match:\nidentical(as.data.frame(player_url_tb), as.data.frame(player_url_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n:::\n\nNow we can add add the game ID, game URL and player URL to the `player_stats_*` dataset.\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_stats_tb <- as_tibble(player_stats) # copied from above\n\nplayer_stats_tb |> \n mutate(\n home_team_code = get_team_code(Home.team),\n away_team_code = get_team_code(Away.team),\n game_id = get_game_id(home_team_code, away_team_code, Date),\n player = paste0(First.name, \" \", Surname, \" (\", Playing.for,\")\")\n ) |> \n left_join(match_stats_flat_tb |> select(game_id, game_afltables_url), by = \"game_id\") |> \n left_join(player_url_tb, by = \"ID\") |> \n relocate(c(\"game_id\", \"player\", \"ID\"), .before = Season) |>\n arrange(game_id, Playing.for, ID) ->\n player_stats_tb\n```\n:::\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_stats_dt <- as.data.table(player_stats) # copied from above\n\nplayer_stats_dt[, home_team_code := get_team_code(Home.team)]\nplayer_stats_dt[, away_team_code := get_team_code(Away.team)]\nplayer_stats_dt[, game_id := get_game_id(home_team_code, away_team_code, Date)]\nplayer_stats_dt[, player := paste0(First.name, \" \", Surname, \" (\", Playing.for,\")\")]\n\nplayer_stats_dt <- merge(\n player_stats_dt, match_stats_flat_dt[, c(\"game_id\", \"game_afltables_url\")], \n by = \"game_id\")\n\nplayer_stats_dt <- merge(player_stats_dt, player_url_dt, by = \"ID\")\n\nsetcolorder(player_stats_dt, c(c(\"game_id\", \"player\"), setdiff(names(player_stats_dt), c(\"game_id\", \"player\"))))\nsetkey(player_stats_dt, game_id, Playing.for, ID)\n# verify outputs match:\nidentical(as.data.frame(player_stats_tb), as.data.frame(player_stats_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n:::\n\n\n\n\n## Finding the Infamous Game\n\nLet's use these new datasets to perform the simple exercise of obtaining the game ID for the [aforementioned](#a-concrete-example) Swans versus Eagles game. We can henceforth use this game ID whenever relevant to rank the Swans in the statistical category we investigate.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n(\n infamous_game_id <- get_game_id(\n home_team_code = get_team_code(\"Sydney\"),\n away_team_code = get_team_code(\"West Coast\"),\n game_date = as.Date(\"2023-06-24\")\n )\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"161820230624\"\n```\n:::\n:::\n\n\n\nWe can then filter the data and present it below in a table below[^table-code-omitted].\n\n[^table-code-omitted]: note that the code to format the table is omitted.\n\n### Match Stats\n\n\n::: {.panel-tabset}\n\n### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmatch_stats_flat_tb |> \n filter(game_id == infamous_game_id)\n```\n:::\n\n\n### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmatch_stats_flat_dt[game_id == infamous_game_id, ]\n```\n:::\n\n\n:::\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\n### Player Stats\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_stats_tb |> \n filter(game_id == infamous_game_id)\n```\n:::\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_stats_dt[game_id == infamous_game_id, ]\n```\n:::\n\n\n:::\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\n\n\n# Figuring Out the Answers\n\nIn this section I will provide my working using R for each of the [aforementioned questions](#question-list). Where relevant, I will figure out where the [aforementioned infamous game](#a-concrete-example) places in the history of the AFL for that particular category.\n\nThe pathways I go down are only one of many permutations of stats you can look at and angles you can approach things from. The code I have written is also probably more thorough and well-presented than how I would typically do it. When I do this type of thing with no intent on publishing it, my data manipulations will generally be far more ad-hoc and expedient (I pay far less attention to reproducibility and consistent naming conventions).\n\n\n## Highest Scoring Quarter\n\nAs listed [above](#question-list), our first question was:\n\n> What is the record for the highest scoring quarter?\n\n::: {.callout-info}\n\nNote that there is already a [page](https://afltables.com/afl/teams/allteams/qh.html){target=\"_blank\"} on this topic on AFL Tables, but it is a good one to start with regardless.\n\n:::\n\nTo answer this question, we will begin by creating a reshaped version of the `match_stats_flat_*` dataset that is structured around quarters.\n\n\n::: {.panel-tabset}\n\n### Tidyverse\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngame_level_columns <- c(\"game_id\", \"game_afltables_url\", \"Season\", \"Round\", \"Venue\", \"Home.team\", \"Away.team\", \"Home.score\", \"Away.score\")\n\nmatch_stats_flat_tb |> \n select(all_of(game_level_columns), starts_with(\"HQ\"), starts_with(\"AQ\")) |> \n pivot_longer(cols = c(starts_with(\"HQ\"), starts_with(\"AQ\")), names_to = \"quarter_gb\", values_to = \"gb_count\") |>\n mutate(\n quarter = str_extract(quarter_gb, \"\\\\d\"),\n gb_label = if_else(str_detect(quarter_gb, \"G$\"), \"goals\", \"behinds\"),\n is_home_score = str_detect(quarter_gb, \"^H\")\n ) |>\n pivot_wider(id_cols = all_of(c(game_level_columns, \"quarter\", \"is_home_score\")), names_from = gb_label, values_from = gb_count) |>\n arrange(game_id, is_home_score, quarter) |> \n group_by(game_id, is_home_score) |> \n mutate(# make quarters incremental\n goals = c(head(goals, 1), diff(goals)),\n behinds = c(head(behinds, 1), diff(behinds))\n ) |> \n ungroup() |> \n mutate(\n score = goals * 6 + behinds,\n team = if_else(is_home_score, Home.team, Away.team),\n opposition = if_else(!is_home_score, Home.team, Away.team)\n ) |> \n select(-is_home_score) -> quarter_stats_tb\n```\n:::\n\n\n### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngame_level_columns <- c(\"game_id\", \"game_afltables_url\", \"Season\", \"Round\", \"Venue\", \"Home.team\", \"Away.team\", \"Home.score\", \"Away.score\")\n\nquarter_stats_dt <- copy(match_stats_flat_dt)\nquarter_stats_dt <- quarter_stats_dt[, .SD, .SDcols = names(quarter_stats_dt) %like% paste(\n paste(game_level_columns, collapse = \"|\"), \"^HQ\", \"^AQ\", \n sep = \"|\")]\nquarter_stats_dt <- melt(quarter_stats_dt, id.vars = game_level_columns, variable.name = \"quarter_gb\", value.name = \"gb_count\")\n\nquarter_stats_dt[, quarter := str_extract(quarter_gb, \"\\\\d\")]\nquarter_stats_dt[, gb_label := fifelse(str_detect(quarter_gb, \"G$\"), \"goals\", \"behinds\")]\nquarter_stats_dt[, is_home_score := str_detect(quarter_gb, \"^H\")]\n\nquarter_stats_dt[, quarter_gb:=NULL]\nquarter_stats_dt <- dcast(quarter_stats_dt, ... ~ gb_label, value.var = \"gb_count\")\n\n# make quarters incremental\nsetorder(quarter_stats_dt, game_id, is_home_score, quarter)\nquarter_stats_dt[, goals := c(head(goals, 1), diff(goals)), c(\"game_id\", \"is_home_score\")]\nquarter_stats_dt[, behinds := c(head(behinds, 1), diff(behinds)), c(\"game_id\", \"is_home_score\")]\n\nquarter_stats_dt[, score := goals * 6 + behinds]\nquarter_stats_dt[, team := fifelse(is_home_score, Home.team, Away.team)]\nquarter_stats_dt[, opposition := fifelse(!is_home_score, Home.team, Away.team)]\n\nquarter_stats_dt <- quarter_stats_dt[, .SD, .SDcols = c(game_level_columns, c(\"quarter\", \"goals\", \"behinds\", \"score\", \"team\", \"opposition\"))]\n\n# verify outputs match:\nidentical(as.data.frame(quarter_stats_tb), as.data.frame(quarter_stats_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n\n:::\n\nWe will answer this question for each quarter (first, second, third and fourth), as well as overall. This means we we will be repeating the same process five times, so this calls for writing a function. The function will give us the top 5 scoring quarters, as well as ranking for the [aforementioned infamous game](#a-concrete-example) on the all time list of quarters.\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\nget_top_quarter_scores_tb <- function(data, quarter_selection) {\n data |> \n filter(quarter %in% quarter_selection) |>\n arrange(desc(score)) |> \n mutate(rank = seq_along(team)) |> \n filter(rank %in% 1:5 | (game_id == infamous_game_id & team == \"Sydney\")) |> \n select(rank, team, opposition, score, quarter, Season, Round, Venue, game_afltables_url, game_id)\n}\n```\n:::\n\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nget_top_quarter_scores_dt <- function(data, quarter_selection) {\n top_quarters_q1_dt <- copy(quarter_stats_dt)\n top_quarters_q1_dt <- top_quarters_q1_dt[\n quarter %in% quarter_selection, ]\n setorder(top_quarters_q1_dt, -score)\n top_quarters_q1_dt[, rank := seq_along(team)]\n top_quarters_q1_dt[rank %in% 1:5 | (game_id == infamous_game_id & team == \"Sydney\"), \n .(rank, team, opposition, score, quarter, Season, Round, Venue, game_afltables_url, game_id)]\n}\n```\n:::\n\n\n:::\n\n\n\n::: {.cell}\n\n:::\n\n\n\n\n### First Quarter\n\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_quarter_scores_q1_tb <- get_top_quarter_scores_tb(quarter_stats_tb, 1L)\n```\n:::\n\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_quarter_scores_q1_dt <- get_top_quarter_scores_dt(quarter_stats_tb, 1L)\n# verify outputs match:\nidentical(as.data.frame(top_quarter_scores_q1_tb), as.data.frame(top_quarter_scores_q1_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n:::\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\nThe record for the highest-scoring first quarter occurred during the bloodbath of an encounter that was the Bombers' first ever clash with the Gary Ablett Jr.-led Gold Coast Suns in their inaugural season in the AFL. The Bombers came out of the blocks in a flash and mercilessly obliterated the inexperienced Gold Coast side, notching up a blistering 93 point lead at quarter time. Interestingly, the Suns actually managed to win the second quarter as the Bombers appeared to take their foot off the accelerator a little to *only* win by 139 points when all was said and done. \n\n\n\n```{=html}\n\n```\n\n\n\nPerhaps the Suns' lethargy in the first quarter against the Dons can be explained as a hangover[^sun-festivities] following on from their [first ever win](https://afltables.com/afl/stats/games/2011/132020110423.html){target=\"_blank\"} the previous week[^first-win]. It is exciting to me that this is a game that I can remember watching on the television at the time, and it may have even been the first Gold Coast game I ever watched[^gold-coast]. Footy is full of narratives and it is fun to spin one around this particular game (the context and stories make footy stats even more fun).\n\n[^sun-festivities]: As a club with an abundance of 18 or 19 year old blokes living out of home for the first time, the Suns were known to [over-indulge](https://youtu.be/Roehqg0Dd5k?t=61){target=\"_blank\"} in the Gold Coast party culture in those days.\n\n[^first-win]: Courtesy of a [(missed) shot at goal after the siren](https://youtu.be/CbJMAHRzHEo?t=318){target=\"_blank\"} from Justin Westhoff.\n\n[^gold-coast]: I am glad that I didn't give up on watching them after that (mainly due to Gary Ablett I will admit) because otherwise I would have missed *unbelievable goals* like [this](https://www.youtube.com/watch?v=2Ae5byjzUKg){target=\"_blank\"}.\n\n\n### Second Quarter\n\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_quarter_scores_q2_tb <- get_top_quarter_scores_tb(quarter_stats_tb, 2L)\n```\n:::\n\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_quarter_scores_q2_dt <- get_top_quarter_scores_dt(quarter_stats_tb, 2L)\n# verify outputs match:\nidentical(as.data.frame(top_quarter_scores_q2_tb), as.data.frame(top_quarter_scores_q2_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n:::\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\nFrom one teams' first season, to another's last. It is quite fitting (although sad) that the highest scoring second quarter was against a floundering ([aforementioned](#the-fitzroy-package)) `fitzRoy` Football Club (to which we owe the ease with which we obtained this data) en-route to a wooden spoon in their [final season](https://www.youtube.com/watch?v=Ykfsk0pXt9E){target=\"_blank\"} prior to merging with Brisbane Bears to form the Brisbane Lions.\n\nAs I was not yet born, I do not remember the game, but on the video below, the commentator shrewdly points to a strong wind prevailing towards the Crows' goal at the beginning of the second quarter which certainly didn't bode well for the Lions.\n\n\n```{=html}\n\n```\n\n\n\n### Third Quarter\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_quarter_scores_q3_tb <- get_top_quarter_scores_tb(quarter_stats_tb, 3L)\n```\n:::\n\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_quarter_scores_q3_dt <- get_top_quarter_scores_dt(quarter_stats_tb, 3L)\n# verify outputs match:\nidentical(as.data.frame(top_quarter_scores_q3_tb), as.data.frame(top_quarter_scores_q3_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n:::\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\nThe Swans' third quarter appears in 28^th^ position here, which is the best position it gets.\n\n\n### Fourth Quarter\n\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_quarter_scores_q4_tb <- get_top_quarter_scores_tb(quarter_stats_tb, 4L)\n```\n:::\n\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_quarter_scores_q4_dt <- get_top_quarter_scores_dt(quarter_stats_tb, 4L)\n# verify outputs match:\nidentical(as.data.frame(top_quarter_scores_q4_tb), as.data.frame(top_quarter_scores_q4_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n:::\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\n\nWell this was slightly unexpected, the Bloods[^incorrect-label] came home like a freight train against the woeful Saints in a game that took place over 100 years ago. It is also the only quarter in AFL history that has notched up a ton. Upon seeing this, given its vintage, I thought that perhaps the story of this game might have been lost to time but the Swans have a most [insightful article](https://www.sydneyswans.com.au/news/235004/slaughter-the-true-story-behind-a-record-thats-stood-for-a-century){target=\"_blank\"} up on their website about it. The explanation in the article claims that St Kilda were undermanned through a combination of injury and player protest on account of off-field disputes. It is safe to say that the \"Bloods\" showed them no mercy.\n\n[^incorrect-label]: Incorrectly labelled here as \"Sydney\" in the table above because at the time they resided in South Melbourne (they [relocated to Sydney in 1982](https://en.wikipedia.org/wiki/Sydney_Swans#Swans_move_to_Sydney:_1982.E2.80.931984){target=\"_blank\"}), they were also known as the \"Bloods\" prior to adopting their current [Swans mascot in 1933](https://en.wikipedia.org/wiki/Sydney_Swans#Club_identity){target=\"_blank\"} due to the number of Western Australians in the side (as a WA boy I couldn't help mentioning this)\n\n### All Quarters\n\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_quarter_scores_tb <- get_top_quarter_scores_tb(quarter_stats_tb, 1L:4L)\n```\n:::\n\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_quarter_scores_dt <- get_top_quarter_scores_dt(quarter_stats_tb, 1L:4L)\n# verify outputs match:\nidentical(as.data.frame(top_quarter_scores_tb), as.data.frame(top_quarter_scores_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n:::\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\n\nIn the infamous game, the Swans' third quarter [^premiership-quarter] was the only one that reached the top 100 quarters of all time. The fact that no quarter was even close to the the top indicates that the Swans were very consistent through-out the game. To use a cliche, they put in a consistent four-quarter effort and I suppose the Eagles were consistent too (consistently dismal).\n\n[^premiership-quarter]: i.e. the [premiership quarter](https://en.wiktionary.org/wiki/premiership_quarter){target=\"_blank\"}\n\n## Most Goal-kickers\n\nThree of the [aforementioned](#question-list) questions concern goal kickers. We can therefore write a function that can generalise our approach like we did for the previous question. \n\n\nThese questions were:\n\n> What is the record for the most individual goal kickers for a team in a single game?\n\n> What is the record for the most multiple goal kickers for a team in a single game?\n\n> What is the record for the most players kicking five or more goals for a team in a single game (i.e. the most \"bags\")?\n\nThe Swans game appeared to have a rather even distribution of goal kickers in the [infamous game](#a-concrete-example), so it will be interesting to see where it places on the all time list in this category.\n\n\n\n::: {.panel-tabset}\n\n### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\nget_top_goal_scorers_tb <- function(data, min_goals) {\ndata |> \n mutate(\n team = Playing.for,\n opposition = if_else(team == Home.team, Away.team, Home.team)\n ) |> \n group_by(team, opposition, Season, Round, Venue, game_afltables_url, game_id, Date) |> \n summarise(\n goal_kickers = sum(Goals > min_goals),\n .groups = \"drop\"\n ) |> \n arrange(desc(goal_kickers), desc(Date)) |> \n mutate(rank = seq_along(game_id)) |> \n relocate(rank, .before = \"team\") |> \n relocate(goal_kickers, .before = \"Season\") |> \n filter(rank %in% 1:5|(game_id == infamous_game_id & team == \"Sydney\")) |>\n select(-Date)\n}\n```\n:::\n\n\n\n\n\n### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nget_top_goal_scorers_dt <- function(data, min_goals) {\n top_goal_scorers_dt <- copy(data)\n top_goal_scorers_dt[, team := Playing.for]\n top_goal_scorers_dt[, opposition := fifelse(team == Home.team, Away.team, Home.team)]\n \n top_goal_scorers_dt <- top_goal_scorers_dt[,.(goal_kickers = sum(Goals > min_goals)),\n c(\"team\", \"opposition\", \"Season\", \"Round\", \"Venue\", \n \"game_afltables_url\", \"game_id\", \"Date\")]\n setorder(top_goal_scorers_dt, -goal_kickers, -Date)\n top_goal_scorers_dt[, rank := seq_along(game_id)]\n \n \n top_goal_scorers_dt[rank %in% 1:5|(game_id == infamous_game_id & team == \"Sydney\"),\n .(rank, team, opposition, goal_kickers, Season, Round, Venue, \n game_afltables_url, game_id)]\n}\n```\n:::\n\n\n:::\n\n\n\n::: {.cell}\n\n:::\n\n\n### Individual\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_goal_scorers_single_tb <- get_top_goal_scorers_tb(player_stats_tb, 0L)\n```\n:::\n\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_goal_scorers_single_dt <- get_top_goal_scorers_dt(player_stats_dt, 0L)\n# verify outputs match:\nidentical(as.data.frame(top_goal_scorers_single_tb), as.data.frame(top_goal_scorers_single_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n:::\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\n\nThe record for most goal kickers is actually tied by multiple teams. The most recent time this happened was in the Bulldogs' 101 point drubbing of the Eagles last year[^eagles-bad]. The [infamous game](#a-concrete-example) is a bit off the pace in 238^th^ but 12 goal-kickers is still double a starting forward line.\n\n[^eagles-bad]: Yet another example of how poorly the Eagles have performed in 2022 and 2023\n\n\n\n### Multiple\n\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_goal_scorers_multiple_tb <- get_top_goal_scorers_tb(player_stats_tb, 1L)\n```\n:::\n\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_goal_scorers_multiple_dt <- get_top_goal_scorers_dt(player_stats_dt, 1L)\n# verify outputs match:\nidentical(as.data.frame(top_goal_scorers_multiple_tb), as.data.frame(top_goal_scorers_multiple_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n:::\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\nThe Swans game actually places equal 7^th^ on the list of all time which is quite a notable result. It is also interesting that it was a similarly one-sided Swans game[^swans-bombers] at the SCG that takes outright top spot. In that game (circa 1987) the human highlight reel [Warwick Capper](https://www.youtube.com/watch?v=iiYJ6FZWwv0){target=\"_blank\"} led all comers for the Swans with a bag of 6 snags.\n\n[^swans-bombers]: [Full game](https://www.youtube.com/watch?v=IR2AjhhNDzE){target=\"_blank\"}, [article](https://www.sydneyswans.com.au/news/132940/footy-flashbacks-essendon){target=\"_blank\"}\n\n### Five or More (Bags)\n\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_goal_scorers_bags_tb <- get_top_goal_scorers_tb(player_stats_tb, 4L)\n```\n:::\n\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntop_goal_scorers_bags_dt <- get_top_goal_scorers_dt(player_stats_dt, 4L)\n# verify outputs match:\nidentical(as.data.frame(top_goal_scorers_bags_tb), as.data.frame(top_goal_scorers_bags_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n:::\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\nFour bags in one game has happened on two occasions, the most recent of which (in 1991) yet again featured the Fitzroy Lions, who were trounced by 157 points by the Hawks in North Hobart. \n\nThe list of bag-getters in this game makes for interesting reading, all were recognisable names (although one more for his family connections that his own merit). As one might expect, one of the bags was courtesy of Hawthorn spearhead Jason Duntall (6 snags), along with 7 apiece from WA boy Ben Allan[^ben-allan] and the three-time premiership player Darren Jarmon. Rounding out the four was a contribution of 5 snags from Paul Hudson, who is the son of Tasmanian footy legend Peter Hudson (how fitting that this game was played in Tassie) who averaged more than 5 goals a game himself (an incredible feat).\n\n[^ben-allan]: Sorry I couldn't help myself, he was also a [Claremont Tiger](https://www.claremontfc.com.au/){target=\"_blank\"} (up the mighty Tiges)\n\n\n## Questions About Questionable Disposal\n\nTwo of the [questions](#question-list) concerned clangers and disposal efficiency:\n\n> What is the record for the most clangers in a game?\n\n> What is the record for the worst disposal efficiency in a game?\n\nThese statistics (which we will define below) are more advanced and have only been recorded more recently, so we will therefore have to check which data sources to use and what years they are available for.\n\n### Most Clangers\n\n\nA clanger is defined as:\n\n> an absurd or embarrassing blunder.\n\nOr in more precise football statistics terms:\n\n> An error made by a player resulting in a negative result for his side. Disposal clangers are any kick or handball that directly turns the ball over to the opposition. Frees and 50-metre penalties against, No Pressure Errors, Dropped Marks and Debits are all included in clangers.\n>\n> Source: [Champion Data](https://www.championdata.com/glossary/afl){target=\"_blank\"}.\n\n[^source-champion-data]: \n\n\nClanger data is available on the AFL Tables data from 1998 onwards, so we will have to make do with only recent memory.\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_stats_tb |> \n filter(!is.na(Clangers)) |> \n group_by(Home.team, Away.team, Season, Round, Venue, game_afltables_url, game_id) |> \n summarise(\n clangers = sum(Clangers),\n .groups = \"drop\"\n ) |> \n arrange(desc(clangers), Season, Round) |> \n mutate(\n rank = seq_along(clangers)\n ) |> \n filter(rank %in% 1:5 | game_id == infamous_game_id) |> \n select(\n rank, game_id, Home.team, Away.team, clangers, Season, Round, Venue, game_afltables_url\n ) -> most_clangers_tb\n```\n:::\n\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmost_clangers_dt <- copy(player_stats_dt)\nmost_clangers_dt <- most_clangers_dt[!is.na(Clangers),]\nmost_clangers_dt <- most_clangers_dt[, .(clangers = sum(Clangers)), c(\"Home.team\", \"Away.team\", \"Season\", \"Round\", \"Venue\", \"game_afltables_url\", \"game_id\")]\nsetorder(most_clangers_dt, -clangers, Season, Round)\nmost_clangers_dt[,rank := seq_along(clangers)]\nmost_clangers_dt <- most_clangers_dt[rank %in% 1:5 | game_id == infamous_game_id,]\nmost_clangers_dt <- most_clangers_dt[, .(rank, game_id, Home.team, Away.team, clangers, Season, Round, Venue, game_afltables_url)]\n# verify outputs match:\nidentical(as.data.table(most_clangers_tb), as.data.table(most_clangers_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n:::\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\n\nA lot of the games in the top 5 are from very recent years, probably highlighting the more recent trend in teams rolling the dice more with possession (high risk, high reward), as popularised by Richmond and adopted to great success of late by the Magpies. Having spot-checked a few examples, there is also a bit of a pattern of wet weather impacting these games it would seem.\n\nAlso notable in the top 5 is one of three games played in Shanghai as part of the AFL's attempt at entering into the Chinese market between 2017 and 2019. While there was some rain about, one might presume from this that the Suns and Port didn't put on a particularly impressive display of our game on that occasion, either that or they were putting on an entertaining show with plenty of high-risk, high-reward plays. The [video highlights](https://www.youtube.com/watch?v=9b2gue45P4k){target=\"_blank\"} appear to be reasonably exciting, so I am going to assume the latter.\n\n\n### Worst Disposal Efficiency\n\nDisposal efficiency is\n\n> the percentage of disposals that are effective.[^source-champion-data]\n\nWhere effective disposal is any of:\n\n> * Effective handball: a handball to a teammate that hits the intended target.\n * Effective Short Kick: A kick of less than 40 metres that results in the intended target retaining possession. Does not include kicks that are spoiled by the opposition.\n * Effective Long Kick: A kick of more than 40 metres to a 50/50 contest or better for the team.[^source-champion-data]\n\nNote how the distance of the disposal is an element of how lenient the definition of \"effective\" is.\n\n\nThis statistic requires our first (and only) use of the [Fryzigg](https://twitter.com/fryzigg){target=\"_blank\"} data, as disposal efficiency is not present on the AFL Tables data. While the Fryzigg data has the full history of the AFL, disposal efficiency is missing for seasons prior to 2012 onwards. We will therefore have to make do with answering this question only for about the past decade.\n\n::: {.panel-tabset}\n\n#### Tidyverse\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplayer_stats_fryzigg_tb |> \n filter(!is.na(disposal_efficiency_percentage)) |> \n mutate(\n home_team_code = get_team_code(match_home_team),\n away_team_code = get_team_code(match_away_team),\n season = str_sub(match_date, 1, 4),\n afl_tables_game_id = get_game_id(home_team_code, away_team_code, as.Date(match_date)),\n afl_tables_url = get_game_afltables_url(afl_tables_game_id, season)\n ) |>\n group_by(afl_tables_game_id, match_home_team, match_away_team, venue_name, season, match_round, afl_tables_url) |> \n summarise(\n disposal_efficiency_game = sum(disposal_efficiency_percentage * disposals) / sum(disposals) / 100,\n .groups = \"drop\"\n ) |> \n arrange(disposal_efficiency_game) |> \n mutate(\n rank = seq_along(disposal_efficiency_game)\n ) |> \n filter(rank %in% 1:5 | afl_tables_game_id == infamous_game_id) |> \n select(\n rank, afl_tables_game_id, match_home_team, match_away_team, disposal_efficiency_game, season, match_round, venue_name, afl_tables_url\n ) -> worst_disposal_efficiency_games_tb\n```\n:::\n\n\n#### `data.table`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nworst_disposal_efficiency_games_dt <- copy(player_stats_fryzigg_dt)\nworst_disposal_efficiency_games_dt <- worst_disposal_efficiency_games_dt[!is.na(disposal_efficiency_percentage), ]\nworst_disposal_efficiency_games_dt[, home_team_code := get_team_code(match_home_team)]\nworst_disposal_efficiency_games_dt[, away_team_code := get_team_code(match_away_team)]\nworst_disposal_efficiency_games_dt[, season := str_sub(match_date, 1, 4)]\nworst_disposal_efficiency_games_dt[, afl_tables_game_id := get_game_id(home_team_code, away_team_code, as.Date(match_date))]\nworst_disposal_efficiency_games_dt[, afl_tables_url := get_game_afltables_url(afl_tables_game_id, season)]\n\nworst_disposal_efficiency_games_dt <- worst_disposal_efficiency_games_dt[,\n .(disposal_efficiency_game = sum(disposal_efficiency_percentage * disposals) / sum(disposals) / 100),\n c(\"afl_tables_game_id\", \"match_home_team\", \"match_away_team\", \"venue_name\", \"season\", \"match_round\", \"afl_tables_url\")\n ]\n\nsetorder(worst_disposal_efficiency_games_dt, disposal_efficiency_game)\nworst_disposal_efficiency_games_dt[, rank := seq_along(disposal_efficiency_game)]\nworst_disposal_efficiency_games_dt <- worst_disposal_efficiency_games_dt[rank %in% 1:5 | afl_tables_game_id == infamous_game_id,]\nworst_disposal_efficiency_games_dt <- worst_disposal_efficiency_games_dt[, .(rank, afl_tables_game_id, match_home_team, match_away_team, disposal_efficiency_game, season, match_round, venue_name, afl_tables_url)]\n\n# verify outputs match:\nidentical(as.data.table(worst_disposal_efficiency_games_tb), as.data.table(worst_disposal_efficiency_games_dt))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] TRUE\n```\n:::\n:::\n\n\n:::\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\n\n\nThe only game with less than 50% disposal efficiency was played in torrid conditions up in Cairns. Looking at the [video highlights](https://www.youtube.com/watch?v=VI4dCyLt82c){target=\"_blank\"}, the players were running through puddles the whole game. I have however seen equal or worse conditions in the past, so it is somewhat curious that this was the worst by such a margin. For reference, Gold Coast were very poor that year, coming second last but the Roos came in a respectable ninth position (only one win and percentage outside the top 8), this also points to them probably winning this game if has been played in more favourable conditions, but they still would have missed out on finals due to the mammoth percentage of the Cats that year. The \"cleanest\" player on the day was Jesse Joyce, whose 8 touches came at 75% efficiency (however it was a rather low sample size).\n\n\nFurther disjointed musings:\n\n* Seeing a game in 2018 is also a fun reminder of the last time the Eagles won a premiership, which feels a long way off given the current predicament the club is in, in spite of their recent win against the similarly languishing[^roos-optimism] Roos last Sunday.\n\n[^roos-optimism]: But far more optimistic due to a combination of Clarko (he returns from his hiatus this week) and promising talent on their list such as Harry Sheezel\n\n* In the 20^th^ century, the use of suburban grounds where the quality of the surface was subpar I am sure led to far more games with lower disposal efficiency than this (muddy fields were far more common in those days). \n\n* We can't entirely blame the players, given the conditions. Anyone who has kicked a footy around in the wet will know how much heavier and slipperier than usual it can get (it is often described as being like a bar of soap). By looking through old highlights packages of the top several games in this metric, all of them appear to have been impacted significantly by weather conditions.\n\n* The Fryzigg data actually has weather conditions as a field on it but it appears to be somewhat unreliable, when cross referencing the games with match reports and highlights, some of the \"sunny\" games turned out to be played in torrential rain.\n\n* I also checked which players have the highest career kicking efficiency and it appears to be mostly defenders who probably inflate their numbers by getting involved in switches of play and chipping the ball around the back line. So we have to take this kind of metric with a grain of salt, there is a certain difficulty level with executing certain types of disposal (e.g. a kick inside 50) that it does not fully capture.\n\n\n\n## Youngest Norm Smith Medalist\n\n\nOur question regarding Norm Smith medallists from above reads:\n\n> Who was the youngest player to win a Norm Smith Medal?\n\n\n::: {.callout-note}\n\n### Background on the Norm Smith Medal\n\n* The Norm Smith Medal is awarded to the player who is adjudged best on ground in the AFL grand final. \n\n\n* The award is named after legendary Melbourne full forward of the 1940's and coach of the 1950's and 1960's, [Norm Smith](https://afltables.com/afl/stats/players/N/Norm_Smith.html){target=\"_blank\"}. In his decorated career, he won a total of 10 premierships, 4 as a player, 6 as coach and all for the Melbourne football club (given Melbourne have only won 13 in their history, quite a feat). At the back end of his playing career, he spent two years as captain-coach (yes that was a thing at the time) of the Fitzroy football club (here they pop up again).\n\n* The Norm Smith medal is usually given to a player on the winning team but very occasionally, players have managed to win the award in a losing side, the last time being 4 out of 45 times and the last time was Eagles (and Carlton) superstar Chris Judd in 2005.\n\n* The Norm Smith Medal was first instituted in 1979. Prior to this, there was no official award given, however the media and fans of the day had their opinions of who the best on ground was in prior grand finals. While [this article](https://themongrelpunt.com/footy-history/2020/04/30/before-the-norm-smith-best-on-ground-prior-to-1979){target=\"_blank\"} lists some \"unofficial\" best on ground performances in grand finals prior to 1979, I will stick with the official list. As a Western Australian, I would have no qualms with discarding the older, exclusively Victorian seasons particularly as this data is of dubious reliability.\n\n\n:::\n\nAs previously mentioned, Norm Smith Medal data is not available on `fitzRoy`, so we will have to scrape it with some of our own bespoke code[^illustrates-fitzroy-point]. The AFL website conveniently has a [nice table](https://www.afl.com.au/stats/leaders-awards/norm-smith-medal\"){target=\"_blank\"}, listing all the winners since the award began in 1979. We will supplement this with data from `fitzRoy` to figure out the level of experience of each player.\n\n\n\n[^illustrates-fitzroy-point]: This illustrates the fact that sometimes you need to stray outside of `fitzRoy` but it gives most of the footy data you could ever want.\n\n::: {.callout-warning}\n\n### No `data.table` code\n\nWhile all the other code has thus far been written in [both tidyverse and `data.table`](#tidyverse-versus-data.table), I decided to leave it for this one as it was rather intricate and painful to perform the same process twice.\n\n:::\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nnorm_smith_url <- \"https://www.afl.com.au/stats/leaders-awards/norm-smith-medal\"\nnorm_smith_html <- read_html(norm_smith_url)\n\nnorm_smith_html |> \n html_table(header = TRUE) |> \n _[[1]] |> \n mutate( #manually adjust to help mapping\n Club = case_when(\n Club == \"Geelong Cats\" ~ \"Geelong\",\n Club == \"West Coast Eagles\" ~ \"West Coast\",\n Club == \"Sydney Swans\" ~ \"Sydney\",\n TRUE ~ Club\n ),\n Player = case_when(\n Player == \"Billy Duckworth\" ~ \"Bill Duckworth\",\n Player == \"Ryan O'Keefe\" ~ \"Ryan OKeefe\",\n TRUE ~ Player \n ),\n Year = as.integer(str_sub(Year, 1, 4))\n ) -> norm_smith_tb\n\n# get the date of their first game\nplayer_stats_tb |> \n group_by(\n player_afltables_url, Playing.for\n ) |> \n summarise(\n first_game_date = min(Date),\n .groups = \"drop\"\n ) -> more_player_details\n\n# get games played to date\nplayer_stats_tb |> \n group_by(ID) |> \n arrange(Date) |> \n mutate(\n games_played_to_date = seq_along(ID)\n ) |> \n ungroup() -> \n player_stats_tb_games_played\n\n# get grand final information\nplayer_stats_tb_games_played |>\n filter(\n Round == \"GF\",\n Season >= 1979\n ) |>\n mutate(\n player_name = paste(First.name, Surname),\n on_winning_team = (Home.team == Playing.for & Home.score > Away.score) | (Away.team == Playing.for & Home.score < Away.score)\n ) |> \n select(ID, player_name, Playing.for, game_afltables_url, player_afltables_url, grand_final_season = Season, grand_final_date = Date, games_played_to_date, on_winning_team) ->\n grand_final_tb\n\ngrand_final_tb |> \n select(grand_final_date) |>\n distinct() |> \n arrange(desc(grand_final_date)) |> \n pull(grand_final_date) ->\n grand_final_dates\n \nplayer_details_tb |> \n mutate(\n first_year = str_sub(Seasons, 1, 4)\n ) -> \n player_details_tb_joinable\n\n\nnorm_smith_tb |> \n mutate(Date = grand_final_dates) |> \n left_join(\n grand_final_tb, by = c(\"Player\" = \"player_name\", \"Club\" = \"Playing.for\", \"Date\" = \"grand_final_date\") \n ) |> \n left_join(\n more_player_details, by = c(\"player_afltables_url\", \"Club\" = \"Playing.for\")\n ) |> \n mutate(\n first_year = format(first_game_date, \"%Y\")\n ) |> \n left_join(\n player_details_tb_joinable, by = c(\"Player\", \"Club\" = \"Team\", \"first_year\")\n ) |>\n mutate(\n debut_age_years = as.integer(str_sub(Debut, 1, 2)),\n debut_age_days = as.integer(str_remove_all(Debut, \".*y |d\")),\n date_of_birth = first_game_date - years(debut_age_years) - days(debut_age_days),\n age_at_grand_final = as.period(interval(start = date_of_birth, end = Date)),\n age_at_grand_final_seconds = as.period(interval(start = date_of_birth, end = Date))\n ) |>\n arrange(age_at_grand_final ) |>\n select(\n Player, Club, Year, on_winning_team, games_played_to_date, age_at_grand_final, age_at_grand_final_seconds, player_afltables_url, game_afltables_url) -> norm_smith_youngest_tb\n```\n:::\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n
\n\n```\n:::\n:::\n\n\n\nThe youngest player on the list is the inaugural winner, Wayne Harmes, who was only 19 years of age when he won the prestigious award. He is known for a [legendary moment](https://www.youtube.com/watch?v=G6jDAtqEi50){target=\"_blank\"} during this match, where, towards the end of the fourth quarter, he ran down his own (errant) kick by sliding along the ground and tapping the ball to keep it in play, sending it into the path of his team mate Ken Sheldon who ran into the open goal. The goal ended up being a decisive one as the Blues came out as victors by only 5 points.\n\n\n\n```{=html}\n\n```\n\n\n\nThe most inexperienced in terms of games played was Maurice Rioli (father of Maurice Jr. who is currently plying his trade at his old man's club), member of the famous [Rioli family](https://en.wikipedia.org/wiki/Rioli_family){target=\"_blank\"} of the Tiwi Islands which has uncannily produced a plethora of great footballers and premiership players. He was the first (and arguably the greatest) Rioli to ever play in the AFL/VFL. One caveat is that while at the time he had only played 21 VFL games, he was 24 years of age and had previously played 6 years of WAFL footy[^wafl-quality] for South Fremantle, so he wasn't your typical 21 game player, he was really in his prime.\n\n\n[^wafl-quality]: At this time (prior to a national competition), WAFL football (as well as the SANFL in South Australia) could be viewed as being at a similar level to the VFL (although VFL did benefit from having the larger population in Victoria as a talent pool). While these days, the WAFL is the tier below the AFL (like the English Championship is the to Premier League), at that time, it could instead be viewed as a competition that was the best within its own region (like Serie A is to La Liga).\n\n\n\n# Epilogue\n\n\n## Closing Remarks\n\n\nTo mention one last record, the margin of 171 in the [infamous game](#a-concrete-example) is actually the equal 4^th^ highest winning margin in a AFL/VFL game. Interestingly, the all-time record in this category comes full circle, being a game between [the Fitzroy Lions and the Melbourne Demons back in 1979](https://afltables.com/afl/stats/games/1979/061119790728.html){target=\"_blank\"}. Fittingly, the victor of this game, was our friends (for whom we owe the greatest gratitude for helping us import data), the mighty `fitzRoy` footy club by a whopping 190 points. So while they had their trials and tribulations as a club (some of which we have covered in this post), it is nice to finish with them on a high note.\n\n\nInterestingly, the [aforementioned infamous game](#a-concrete-example) isn't the record-holder (or even in the top 5) for any of our [questions](#question-list)[^failure-to-break-records] but it is only one of 31 games where a team has scored 200 points or more which is notable enough I think, particularly given I had the (mis-)fortune of witnessing such a rare event in the flesh. Perhaps we could dig deeper to find a record it holds (every game is uniquely remarkable in some way if you look hard enough) but I somehow find more satisfaction in it being a thought-provoking enough game to coax these questions out of us without it ever being the *answer*.\n\n[^failure-to-break-records]: the closest it came was in the most multiple goal kickers category at equal 7^th^\n\n## Notable AFL Stats Figures\n\nI will conclude by listing some people who are doing interesting work with AFL stats (often with heavy use of R and the `fitzRoy` package) to provide further motivation:\n\n- [`fitzRoy`](https://github.com/jimmyday12/fitzRoy): as outlined in this article, this R package is the de facto way of sourcing AFL data.\n\n- [Useless AFL Stats](https://www.facebook.com/uselessaflstats){target=\"_blank\"}: a Facebook page which shares always interesting, sometimes abstract and often amusing AFL stats content. [Liam Crow](https://twitter.com/crow_data_sci){target=\"_blank\"} is their data scientist and posts some interesting content of his own on his website: [https://www.crowdatascience.com](https://www.crowdatascience.com){target=\"_blank\"}.\n\n- [squiggle.com.au](https://squiggle.com.au/leaderboard/){target=\"_blank\"}: displays a bunch of people's data-driven tipping models, many of which have websites and social media accounts where they do AFL stats.\n\n- [Jaiden Popowski](https://twitter.com/jaiden_popowski){target=\"_blank\"}: is prominent in the [AFL Fantasy](https://fantasy.afl.com.au/){target=\"_blank\"} community for the interesting data-driven analysis he produces.\n\n- [DFS Australia](https://dfsaustralia.com/afl-home/){target=\"_blank\"}: has some great data-driven tools that provide insight on advanced stats commonly used in [AFL Fantasy](https://fantasy.afl.com.au/){target=\"_blank\"}.\n\n\n# Comments\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_site_rendered/blog/2023/over-analysing-idle-footy-chat/index.html b/_site_rendered/blog/2023/over-analysing-idle-footy-chat/index.html index 8aa9786..0cd0850 100644 --- a/_site_rendered/blog/2023/over-analysing-idle-footy-chat/index.html +++ b/_site_rendered/blog/2023/over-analysing-idle-footy-chat/index.html @@ -368,8 +368,8 @@

(Over-)Analysing Idle Footy Chat

  • 6 the baseball/basketball reference of the footy world (but maybe not quite as extensive)

  • 7 19, versus Melbourne in round 10 2009

  • 8 Tom Scully, 18.9 KM

  • Even so, some more sophisticated questions will still go unanswered.

    - - + +

    However, if you are a gadget-type operator9 like myself, you will expand the number of footy stats questions you can answer immensely by accessing and manipulating the raw data yourself. There are, of course, a multitude of tools and approaches to this, but in this post, I will be using R (my preferred programming language)

  • 9 My footy tipping username is Gadget-type Operator and I often use other BT quotes for my username on other (even non-footy-related) accounts

  • @@ -442,7 +442,7 @@

    (Over-)Analysing Idle Footy Chat

    library(fitzRoy)
     
     # Note that I generally avoid mixing dplyr and data.table at the same time
    -#   but the reason I have done with will become apparent later
    +#   but the reason I have done this with will become apparent later
     library(dplyr)
     library(data.table)
     
    @@ -463,9 +463,9 @@ 

    (Over-)Analysing Idle Footy Chat

    -
    -
    - +
    +
    +

    Note that each row of the table can be expanded to reveal what data is available from each source, as well at its use-case. In addition to the sources listed in this table, the following functions only come from one source:

    @@ -1071,9 +1071,9 @@

    (Over-)Analysing Idle Footy Chat

    -
    -
    - +
    +
    +

    Player Stats

    @@ -1099,9 +1099,9 @@

    (Over-)Analysing Idle Footy Chat

    -
    -
    - +
    +
    +

    Figuring Out the Answers

    @@ -1251,14 +1251,14 @@

    (Over-)Analysing Idle Footy Chat

    -
    -
    - +
    +
    +

    The record for the highest-scoring first quarter occurred during the bloodbath of an encounter that was the Bombers’ first ever clash with the Gary Ablett Jr.-led Gold Coast Suns in their inaugural season in the AFL. The Bombers came out of the blocks in a flash and mercilessly obliterated the inexperienced Gold Coast side, notching up a blistering 93 point lead at quarter time. Interestingly, the Suns actually managed to win the second quarter as the Bombers appeared to take their foot off the accelerator a little to only win by 139 points when all was said and done.

    -

    Perhaps the Suns’ lethargy in the first quarter against the Dons can be explained as a hangover25 following on from their first ever win the previous week26. It is exciting to me that this is a game that I can remember watching on the television at the time, and it may have even been the first Gold Coast game I ever watched27. Footy is full of narratives and it fun to spin one around this particular game (the context and stories make footy stats even more fun).

  • 25 As a club with an abundance of 18 or 19 year old blokes living out of home for the first time, the Suns were known to over-indulge in the Gold Coast party culture in those days.

  • 26 Courtesy of a (missed) shot at goal after the siren from Justin Westhoff.

  • 27 I am glad that I didn’t give up on watching them after that (mainly due to Gary Ablett I will admit) because otherwise I would have missed unbelievable goals like this.

  • +

    Perhaps the Suns’ lethargy in the first quarter against the Dons can be explained as a hangover25 following on from their first ever win the previous week26. It is exciting to me that this is a game that I can remember watching on the television at the time, and it may have even been the first Gold Coast game I ever watched27. Footy is full of narratives and it is fun to spin one around this particular game (the context and stories make footy stats even more fun).

  • 25 As a club with an abundance of 18 or 19 year old blokes living out of home for the first time, the Suns were known to over-indulge in the Gold Coast party culture in those days.

  • 26 Courtesy of a (missed) shot at goal after the siren from Justin Westhoff.

  • 27 I am glad that I didn’t give up on watching them after that (mainly due to Gary Ablett I will admit) because otherwise I would have missed unbelievable goals like this.

  • Second Quarter

    Multiple

  • Player Lookup Table
  • +
  • Comments
  • @@ -237,7 +240,7 @@

    Dissecting Footy Grid Combinations

    You are awarded a rarity score based on what other players of the game have chosen, the lower your score the rarer the player. Given people know this, sometimes a less obvious player is actually more selected than the most obvious. For example, in today’s grid Scotty Lucas was actually the most popular1 Essendon player to kick more than 50 goals in a season over Matthew Lloyd (who I would have thought would be the most obvious).

  • 1 the game allows you to see the most popular selections once you have attempted it

  • My Performance

    -

    I would say I am a relatively average2 player at the game. While it varies quite a lot, I would say a solid score for me is when I am better than around 40%3 of players. My attempt for today (the 8thof August) is below:

  • 2 Among a sea of footy tragics, so I don’t take any shame in this

  • 3 I will elaborate on why I think the 40th percentile is at least average if not above average below

  • +

    I would say I am a relatively average2 player at the game. While it varies quite a lot, I would say a solid score for me is when I am better than around 40%3 of players (although I will occasionally score in the 80th or even 90th percentile if it aligns with my area of expertise4). My attempt on the 8thof August is below:

  • 2 Among a sea of footy tragics, so I don’t take any shame in this

  • 3 I will elaborate on why I think the 40th percentile is at least average if not above average below

  • 4 For example, I tend to be pretty solid at pulling out obscure Eagles and Dockers players from the late 2000’s and early 2010’s

  • -

    You will notice that my attempt is missing an answer for the middle square of the grid (Essendon and Collingwood player) and that is because I could not figure out that answer and took a wild stab. As I discovered later on, the most popular player for that question was an obscure player from the late 1990’s and early 2000’s (Andrew Ukovic4) that I am certain most people had to look up or entered on their second attempt when they saw he was most popular.

  • 4 perhaps I am too young and he was actually a household name but I somehow highly doubt this given his mediocre career statistics

  • -

    This is why I would say I am average as opposed to below average for being in the 40th percentile, a lot of the scores on there are people attempting the grid multiple times or even cheating5 on their first time. I don’t really care if people do that but for me it’s a lot more fun doing it without any kind of assistance6.

  • 5 I have seen varying interpretations of what counts as cheating, but I would certainly say googling the answer and then entering a player you have never heard of is cheating

  • 6 and perhaps it also make me feel better when I see my less than stellar percentiles on most days

  • +

    You will notice that my attempt is missing an answer for the middle square of the grid (Essendon and Collingwood player) and that is because I could not figure out that answer and took a wild stab. As I discovered later on, the most popular player for that question was an obscure player from the late 1990’s and early 2000’s (Andrew Ukovic5) that I am certain most people had to look up or entered on their second attempt when they saw he was most popular.

  • 5 perhaps I am too young and he was actually a household name but I somehow highly doubt this given his mediocre career statistics

  • +

    This is why I would say I am average as opposed to below average for being in the 40th percentile, a lot of the scores on there are people attempting the grid multiple times or even cheating6 on their first time. I don’t really care if people do that but for me it’s a lot more fun doing it without any kind of assistance7.

  • 6 I have seen varying interpretations of what counts as cheating, but I would certainly say googling the answer and then entering a player you have never heard of is cheating

  • 7 and perhaps it also make me feel better when I see my less than stellar percentiles

  • Most Diffcult Combinations

    List of Categories

    -

    Below is an exhaustive list of all the categories that have appeared in Footy Grid between the 21st of July and the 8th of August 2023 (the dates that are currently visible on the website on the date of writing this):

    +

    Below is an exhaustive list of all the categories that have appeared in Footy Grid between the 21st of July and the 29th of August 2023 (the dates that are currently visible on the website on the date of writing this):

      -
    • Club played for 7,8

    • +
    • Club played for 8,9

    • -

      Played in a particular decade9:

      +

      Played in a particular decade10:

      • 80’s

      • 90’s

      • @@ -270,7 +273,7 @@

        Dissecting Footy Grid Combinations

    • -

      Season stats 10:

      +

      Season stats 11,12:

      • >50 goals

      • 300+ kicks

      • @@ -278,25 +281,46 @@

        Dissecting Footy Grid Combinations

      • Average 25+ disposals

      • Average 5+ marks

      • Average 5+ tackles

      • +
      • Average 100+ fantasy points

      • +
      • Won 15+ games

      • +
      • Lost 15+ games

    • -

      Career stats 11,12:

      +

      Career stats 13,14:

        -
      • 20 or less games

      • -
      • 50 or less games

      • -
      • 200+ games

      • +
      • Played 20 or less games

      • +
      • Played 50 or less games

      • +
      • Played 200 or more games

      • Never scored a goal

      • 250+ goals

      • +
      • 250+ games

      • 2500+ kicks

      • 500+ tackles

      • +
      • All Time Top 50 Goal Kicker

      • +
      +
    • +
    • +

      Game stats15,16:

      +
        +
      • 5+ goals

      • +
      • 10+ marks

      • +
      • 10+ tackles

      • +
      • 30+ disposals

      • +
      • 40+ disposals

      • +
      • 25+ disposals in a final

    • Grand Final Player

    • +
    • One Club Player

    • Awards:

        -
      • Brownlow Medalist
      • +
      • Brownlow Medalist

      • +
      • Club Best and Fairest (1980 onwards)

      • +
      • Colman Medalist

      • +
      • Rising Star Nomination

      • +
      • Norm Smith Medalist

    • @@ -307,45 +331,65 @@

      Dissecting Footy Grid Combinations

  • -

    Name13:

    +

    Name17:

    • First name Jack
  • +
  • Left footers18

  • +
  • One-Club Player

  • +
  • +

    Coached by:

    +
      +
    • Ross Lyon

    • +
    • Brad Scott

    • +
    • Leigh Matthews

    • +
    +
  • +
  • +

    Teammate of19:

    +
      +
    • Lance Franklin

    • +
    • Isaac Smith

    • +
    • Nic Naitanui

    • +
    • Andrew Phillips

    • +
    • Scott Pendlebury

    • +
    • Jack Riewoldt

    • +
    • Jack Ziebell

    • +
    • Brodie Grundy

    • +
    • Dustin Fletcher

    • +
    • Luke Shuey

    • +
    • Tom Rockliff

    • +
    • Phil Davis

    • +
    • Stephen Coniglio

    • +
    • Dylan Buckley

    • +
    • Daniel Gorringe

    • +
    +
  • +
  • +

    Guernsey number:

    +
      +
    • Single digit

    • +
    • Double digit

    • +
    +
  • +
  • All Australian (1991 to present)

  • -
  • 7 University and Fitzroy as defunct teams will never appear as a category, although it says in the Footy Grid help page that Fitzroy will be combined in with Brisbane in the future

  • 8 The South Melbourne and Sydney swans; the Brisbane Bears and Lions; Footscray and Western Bulldogs are grouped together

  • 9 we can probably extend this to played in any decade, these are just the decades we have seen so far

  • 10 includes finals

  • 11 includes finals

  • 12 there are probably different stats and thresholds that will appear in the future

  • 13 there will probably be other names that crop up in the future

  • As it is the most prominent category, we will mainly focus on teams in this article but some of the other categories may appear as well. Also note that generally, according the the rules, when a team category intersects with a non-team category, the player need not have satisfied the category whilst playing at the club14, except for awards where they needed to win the award when playing for the club15.

  • 14 e.g. Gary Ablett Jr. counts as a Grand Final player for Gold Coast even though he only ever played in grand finals for Geelong

  • 15 e.g. Isaac Smith counts as a Norm Smith Medalist for Geelong but not for Hawthorn

  • +
  • 8 University and Fitzroy as defunct teams will never appear as a category, although it says in the Footy Grid help page that Fitzroy will be combined in with Brisbane in the future

  • 9 The South Melbourne and Sydney swans; the Brisbane Bears and Lions; Footscray and Western Bulldogs are grouped together

  • 10 we can probably extend this to played in any decade, these are just the decades we have seen so far

  • 11 includes finals

  • 12 there are probably different stats and thresholds that will appear in the future

  • 13 includes finals

  • 14 there are probably different stats and thresholds that will appear in the future

  • 15 includes finals

  • 16 there are probably different stats and thresholds that will appear in the future

  • 17 there will probably be other names that crop up in the future

  • 18 Data available 2013 onwards

  • 19 These ones typically honour retirees

  • As it is the most prominent category, we will mainly focus on teams in this article but some of the other categories may appear as well. Also note that generally, according the the rules, when a team category intersects with a non-team category, the player need not have satisfied the category whilst playing at the club20.

  • 20 e.g. Gary Ablett Jr. counts as a Grand Final player for Gold Coast even though he only ever played in grand finals for Geelong

  • Obtaining the Data

    In order to look through the possible combinations, I will use R as I did in my previous post. Likewise, I will also be using the fitzRoy package to obtain my data. Footy Grid has the full history of AFL players, so I will use the fryzigg data. There are also some player details that are needed and for that, I will use the AFL Tables.

    -

    Let’s begin by loading our required packages16.

  • 16 As I outline below, I will only be using tidyverse for data wrangling this time.

  • +

    Let’s begin by loading our required packages21.

  • 21 As I outline below, I will only be using tidyverse for data wrangling this time.

  • -
    -
    
    -Attaching package: 'dplyr'
    -
    -
    -
    The following objects are masked from 'package:stats':
    -
    -    filter, lag
    -
    -
    -
    The following objects are masked from 'package:base':
    -
    -    intersect, setdiff, setequal, union
    -
    -
    -

    We will get the data using the following code17:

  • 17 Note that I am using the fst package below to save the data locally instead of using RDS files on suggestion from one of the readers of my previous posts (see the “Being a good citizen” call-out in my previous post for more information on why I save the output locally)

  • +

    We will get the data using the following code22:

  • 22 Note that I am using the fst package below to save the data locally instead of using RDS files on suggestion from one of the readers of my previous posts (see the “Being a good citizen” call-out in my previous post for more information on why I save the output locally)

  • -
    if(file.exists("data/player_stats_fryzigg.fst")) {
    +
    if(file.exists("data/player_stats_fryzigg.fst")) {
       player_stats_fryzigg <- read_fst("data/player_stats_fryzigg.fst") |> as_tibble()
       
     } else {
    @@ -361,10 +405,35 @@ 

    Dissecting Footy Grid Combinations

    write_fst(player_details, "data/player_details.fst") }
    -

    Data Wrangling

    -

    Unlike the previous post, I will only use the tidyverse18.

  • 18 While the experiment of producing both tidyverse and data.table of the code was an interesting one, I have decided that it is probably too time consuming going forward.

  • +

    Other Data Sources

    +

    While the fitzRoy sources provide most of the data we need but some data is still missing.

    +

    Brownlow medal data

    +

    As with the Norm Smith medal in my previous post, the Brownlow medal winners are not available in the fitzRoy data. However they are available on AFL Tables, Wikipedia and the official AFL Website. As it turns out, the easiest one to scrape is the AFL website, so that is what we will use.

    -
    player_stats_fryzigg$player_team |> unique() -> teams_fryzigg
    +
    # get brownlow
    +brownlow_url <- "https://www.afl.com.au/brownlow-medal/history"
    +
    +brownlow_url |> 
    +  read_html() ->
    +  brownlow_html
    +
    +brownlow_html |> 
    +  html_table(header = TRUE) |> 
    +  _[[1]] |> 
    +  View()
    +
    +

    All-Australian Data

    +

    The official AFL website has a table of all the all players selected in All Australian Teams since 1953 here23.

  • 23 Before 1991 it was either the VFL team of the year or based on the interstate carnival, hence why Footy Grid only considers All Australian from 1991 onwards

  • +
    +
    "data-Dtbc3.csv"
    +
    +
    [1] "data-Dtbc3.csv"
    +
    +
    +

    Data Wrangling

    +

    Unlike the previous post, I will only use the tidyverse24.

  • 24 While the experiment of producing both tidyverse and data.table of the code was an interesting one, I have decided that it is probably too time consuming going forward.

  • +
    +
    player_stats_fryzigg$player_team |> unique() -> teams_fryzigg
     player_details$Team |> unique() -> teams_afl_tables
     
     tribble(
    @@ -410,7 +479,7 @@ 

    Dissecting Footy Grid Combinations

    player_height = HT, player_weight = WT, ) -> - player_measurements + player_details_modified player_stats_fryzigg |> mutate( @@ -450,26 +519,10 @@

    Dissecting Footy Grid Combinations

    arrange(desc(Games)) |> View()
    -

    Scraping the Brownlow medal data

    -

    As with the Norm Smith medal in my previous post, the Brownlow medal winners are not available on

    -

    https://afltables.com/afl/brownlow/brownlow_idx.html

    -

    https://en.wikipedia.org/wiki/List_of_Brownlow_Medal_winners#Winners_by_season

    -
    -
    # get brownlow
    -brownlow_url <- "https://www.afl.com.au/brownlow-medal/history"
    -
    -brownlow_url |> 
    -  read_html() ->
    -  brownlow_html
    -
    -brownlow_html |> 
    -  html_table(header = TRUE) |> 
    -  _[[1]] |> 
    -  View()
    -
    -

    Final Output

    +

    Final Output

    Player Lookup Table

    Below I have a lookup table, this table is not for cheating purposes but more done as an exercise and perhaps for looking into who some obscure examples might have been after you have attempted the quiz on the day. I may continue to update this as new categories get introduced, but for now I present it with all the categories that have come about between the 21st of July and the 8th of August 2023.

    +

    Comments

    Back to top diff --git a/_site_rendered/blog/2023/more-afl-stats/index.html b/_site_rendered/blog/2023/more-afl-stats/index.html index 8be1de9..7ff8aa1 100644 --- a/_site_rendered/blog/2023/more-afl-stats/index.html +++ b/_site_rendered/blog/2023/more-afl-stats/index.html @@ -114,7 +114,7 @@ @@ -531,6 +531,22 @@

    More AFL Stats

    }); } } + var localhostRegex = new RegExp(/^(?:http|https):\/\/localhost\:?[0-9]*\//); + var filterRegex = new RegExp("https:\/\/t-gummer\.netlify\.app"); + var isInternal = (href) => { + return filterRegex.test(href) || localhostRegex.test(href); + } + // Inspect non-navigation links and adorn them if external + var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item)'); + for (var i=0; i diff --git a/_site_rendered/blog/2023/over-analysing-idle-footy-chat/index.html b/_site_rendered/blog/2023/over-analysing-idle-footy-chat/index.html index 0cd0850..a3ba4f3 100644 --- a/_site_rendered/blog/2023/over-analysing-idle-footy-chat/index.html +++ b/_site_rendered/blog/2023/over-analysing-idle-footy-chat/index.html @@ -132,7 +132,7 @@ Blog @@ -141,6 +141,11 @@ CV +
    diff --git a/_site_rendered/cv/index.html b/_site_rendered/cv/index.html index dc6eb61..421ad60 100644 --- a/_site_rendered/cv/index.html +++ b/_site_rendered/cv/index.html @@ -92,7 +92,7 @@ Blog @@ -101,6 +101,11 @@ CV +