Skip to content

Generalized Additive Models; a data-driven approach to estimating regression models

License

Notifications You must be signed in to change notification settings

gavinsimpson/physalia-gam-course

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Generalized Additive Models; a data-driven approach to estimating regression models

Physalia-Courses

https://www.physalia-courses.org/

Gavin Simpson

20–24th January, 2025

Overview

Most of the statistical methods you are likely to have encountered will have specified fixed functional forms for the relationships between covariates and the response, either implicitly or explicitly. These might be linear effects or involve polynomials, such as x + x2 + x3. Generalized additive models (GAMs) are different; they build upon the generalized linear model (GLM) by allowing the shapes of the relationships between response and covariates to be learned from the data using splines. Modern GAMs, it turns out, are a very general framework for data analysis, encompassing many models as special cases, including GLMs and GLMMs, and the variety of types of splines available to users allows GAMs to be used in a surprisingly large number of situations. In this course we’ll show you how to leverage the power and flexibility of splines to go beyond parametric modelling techniques like GLMs.

Target audience and assumed background

The course is aimed at at graduate students and researchers with limited statistical knowledge; ideally you’d know something about generalized linear models, but we’ll recap what GLMs are, so if you’re a little rusty or not everything mentioned in a GLM course made sense, we have you covered.

Participants should be familiar with RStudio and have some fluency in programming R code, including being able to import, manipulate (e.g. modify variables) and visualise data. There will be a mix of lectures, in-class discussion, and hands-on practical exercises along the course. From running the course previously, knowing the difference between "fixed" and "random" effects, and what the terms "random intercepts" and "random slopes" are, will be helpful for the Hierarchical GAM topic, but we don't expect you to be an expert in mixed effects or hierarchical models to take this course.

Learning outcomes

  1. Understand how GAMs work from a practical view point to learn relationships between covariates and response from the data,

  2. Be able to fit GAMs in R using the mgcv package,

  3. Know the differences between the types of splines and when to use them in your models,

  4. Know how to visualise fitted GAMs and to check the assumptions of the model.

Pre-course preparation

Install an up-to-date version of R

Please be sure to have at least version 4.4.0 of R installed (the version of my gratia package we will be using depends on you having at least version 4.1.0 installed and some slides might contain code that requires version 4.4.x). Note that R and RStudio are two different things: it is not sufficient to just update RStudio, you also need to update R by installing new versions as they are release.

To download R go to the CRAN Download page and follow the links to download R for your operating system:

To check what version of R you have installed, from within R, you can run

version

then look at the version.string entry (or the major and minor entries). For example, on my system I see:

# ... output not shown ...
major          4                           
minor          4.2 
# ... output not shown ...
version.string R version 4.4.2 (2024-10-31)
# ... output not shown ...

Update your R packages, and install the required R packages

We will make use of several R packages that you'll need to have installed. Prior to the start of the course, please run the following code to update your installed packages and then install the required packages:

# update any installed R packages
update.packages(ask = FALSE, checkBuilt = TRUE)

# packages to install
pkgs <- c("mgcv",  "gamm4", "tidyverse", "readxl", "mgcViz", "DHARMa", "gratia",
  "ggforce", "marginaleffects")

# install those packages
install.packages(pkgs, Ncpus = 4) # set Ncpus to # of *physical* CPU cores you have

We might need the development version of gratia; you can install this with

# Install gratia in R
install.packages("gratia", repos = c(
  "https://gavinsimpson.r-universe.dev",
  "https://cloud.r-project.org"
))

Now we must check that we actually do have recent versions of the packages installed; if your R is not reasonably new (gratia requires R>= 4.1.0, but some of the tidyverse packages may need an R that is newer than this) then you may be stuck on out-dated versions of the packages listed above. This is why I recommend that you install the latest version of R. If you choose to use an older version of R than version 4.4.x (where x is 0, 1, or 2 currently) then you do so at your own risk and you cannot expect support with setup problems during the course.

vapply(pkgs, packageDescription, character(1), drop = TRUE, fields = "Version")

On my system I see:

> vapply(pkgs, packageDescription, character(1), drop = TRUE, fields = "Version")
     mgcv     gamm4 tidyverse    readxl    mgcViz    DHARMa    gratia
  "1.9-1"   "0.2-6"   "2.0.0"   "1.4.3"  "0.1.11"   "0.4.7"  "0.10.0.9001"

The key ones are to be sure that gratia is version "0.10.0", mgcv is at least "1.9-0" (preferably "0.9-1"), and tidyverse is "2.0.0".

Programme

Sessions from 14:00 to 20:00 (Monday to Thursday), 14:00 to 19:00 on Friday (Berlin time). From Tuesday to Friday, the first hour will be dedicated to Q&A and working through practical exercises or students’ own analyses over Slack and Zoom. Sessions will interweave mix lectures, in-class discussion/ Q&A, and practical exercises.

Monday

Slides

  • Brief overview of R and the Tidyverse packages we’ll encounter throughout the course
  • Recap generalised linear models
  • Fitting your first GAM

Tuesday

Slides

  • How do GAMs work?
  • What are splines?
  • How do GAMs learn from data without overfitting?

We’ll dig under the hood a bit to understand how GAMs work at a practical level and how to use the mgcv and gratia packages to estimate GAMs and visualise them.

Wednesday

Slides

  • Model checking, selection, and visualisation.
  • How do we do inference with GAMs?
  • Go beyond simple GAMs to include smooth interactions and models with multiples smooths.

Thursday

Slides

  • Hierarchical GAMs; introducing random smooths and how to model data with both group and individual smooth effects.
  • Doing more with your models; introducing posterior simulation.

Friday

  • Going beyond the mean; fitting distributional models

  • Worked examples

About

Generalized Additive Models; a data-driven approach to estimating regression models

Resources

License

Stars

Watchers

Forks

Packages

No packages published