lucidum is an open source R Shiny app to help users build and communicate GLMs and GBMs without writing code.
To see lucidum in action visit ShinyConf 2023 - lucidum app showcase.
lucidum works with standard R data.frames and data.tables and is designed to make model building more interactive, visual and insightful.
I originally wrote lucidum to automate the repetitive tasks involved when building the types of regression models common to UK personal lines insurers. More recently, I have used it as a tool to help insurers move from GLMs to GBMs (specifically LightGBM), using SHAP values to understand and communicate model features and interaction effects.
lucidum's functionality includes:
-
Collection of metadata to support a modelling exercise
- define modelling KPIs (e.g. frequency by claim peril)
- filters to apply to charts and maps (e.g. new business vs renewals)
- base levels and bandings to apply when tabulating a GLM
- setup feature scenarios for inclusion in a GBM
-
Interactive charting
- actual vs expected charting by rating factor, with easy access to filters and training vs test views
- plot several models' predictions (GLMs and GBMs) simultaneously
- user-defined banding for continuous features - no "pre-banding" required
- overlay "single profile" lines for GLMs to understand the underlying model effect
- overlay SHAP value ribbons for GBMs to understand the underlying model effect
-
Interactive mapping of data at UK Postcode Area, Sector and Unit resolution
- uses the leaflet library to draw choropleth maps for Postcode Area and Sector
- uses open source shapefiles for Area and Sector
- point plotting for postcode units based on latitude and longitude
-
Support a GLM build
- "formula helper" to make the job of building an R GLM formula much faster
- convert GLMs to tabular format ("ratebooks") with user-defined bandings and base levels
- export tabulated GLMs as Excel workbooks
-
Support a GBM build
- provide a simple user interface for GBM feature selection
- provide a simple user interface to the most common LightGBM parameters
- build GA2M models (1D+2D GBMs) to support interaction detection
- use feature interaction constraints to build indices for high cardinality features like postcode
- 1D SHAP plots to interpret the model's main effects
- 2D SHAP plots to interpret interaction effects
- convert GBMs to tabular format ("ratebooks") with user-defined bandings and base levels
You can install the development version of lucidum from GitHub with:
# install.packages("devtools")
devtools::install_github("SpeckledJim2/lucidum")
library(lucidum)
lucidum(your_dataframe_name)
To load the demo dataset into lucidum use:
library(lucidum)
lucidum(insurance)
To separate training and test rows in your dataset, include a numerical column called "train_test" with value 0 for training and 1 for test.
Include columns in your dataset called PostcodeArea and PostcodeSector to enable UK mapping.
Include columns called PostcodeUnit, lat (latitude) and long (longitude) to enable postcode unit level plotting.
Search for the "National Statistics Postcode Lookup" at https://geoportal.statistics.gov.uk to download the latest unit to lat,long mapping - updated versions are released quarterly.
Ensure postcode sector and unit contain a single character space between the outcode and incode with all letters in CAPITALS, e.g. PO15 7JT.
Specification files make lucidum more useful by specifying metadata to make model building faster.
Specification files are .csv files which can be created within lucidum itself or in a text editor. You don’t have to use specification files, but they make life easier if you are going to be working with a dataset on a regular basis.
There are three types of specification files:
- KPI specification: the metrics you want to access quickly in the app’s sidebar
- Filter specification: formulae that define filters you want to apply to charts and maps
- Feature specification: quicker access to features in ChartaR and feature scenarios that you want to use in your models