title | tags | authors | affiliations | date | bibliography | |||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
xspline: A Python Package for Flexible Spline Modeling |
|
|
|
02.22.2024 |
paper.bib |
Splines are a fundamental tool for describing and estimating nonlinear relationships [@de1978practical]. They allow nonlinear functions to be represented as linear combinations of spline basis elements. Researchers in physical, biological, and health sciences rely on spline models in conjunction with statistical software packages to fit and describe a vast range of nonlinear relationships.
A wide range of tools and packages exist to support modeling with splines. These tools include
- Splipy [https://pypi.org/project/Splipy/] [@johannessen2020splipy]
- splines [https://pypi.org/project/splines/]
- spline support in scipy [https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.BSpline.html]
- pyspline [https://mdolab-pyspline.readthedocs-hosted.com/en/latest/index.html]
- splinter [https://github.com/bgrimstad/splinter]
Several important gaps remain in python packages for spline modeling. xspline
is not a comprehensive tool that generalizes existing software. Instead, it provides key functionality that undergirds flexible interpolation and fitting, closing existing gaps in the available tools. xspline
is currently widely used in global health applications [@murray2020global], undergidring the majority of spline modeling at the Institute of Health metrics and Evaluation (IHME).
Current spline packages offer broad functionality in spline fitting, including:
- Manipulating and estimating curves (scipy, splines), surfaces and volumes (splipy, pySpline)
- Numerical derivatives (splipy, splines, scipy, pyspline, splinter)
- Interpolation (splipy, splines, scipy, pyspline, splinter)
- Spline derivatives, antiderivaties and numerical integrals (scipy)
- Extrapolation (scipy, limited)
From this list, its apparent that scipy
offers the most comprehensive features related to derivaties, integrals, and extrapolation. However, key limitations remain. First, while scipy
provides derivative and anti-derivative spline objects, it still evaluates definite integrals numerically. In addition, while the first and last segments of the b-spline in scipy
can be extrapolated, there is no option for the user to extrapolate a simpler functional form, e.g. a quadratic polynomial given a cubic spline.
This functionality is essential to risk modeling. For example, data reported by all studies focusing on risk-outcome pairs are ratios of definite integrals across different exposure intervals. Prior packages do not offer a direct way to fit spline functions to these nonlinear data, because they do not provide definite integrals of splines as spline objects. Spline derivatives are also needed to impose shape constraints on risk curves of interest. Finally, extrapolations are often required to areas with little to no data, while maintaining high-fidelity fits for regions with dense data. Theoretically, it is straightforward to extrapolate any fit of degree less than or equal to the degree of the ultimate segments (for example, using slope matching for first order, slope and curvature for second order, etc.) However, this functinoality is not available in other packages.
The main idea of xspline
is to provide a python class that allows user to
interact with basis splines, their derivatives and integrals and extrapolation
options more easily.
The computation of splines is based on basis splines (B-splines), see [@de1978practical] for a canonical reference. Using this reference, we derived recursive relationships to compute both derivatives and definite integrals from recursive splie relationships.
To support the spline basis computation, we also created modules that provide a
convenient interface with indicator and polynomial functions, and their
derivatives and definite integrals of any order. All of these useful functions are
bundled into a main interface class called XFunction
, which allows the user to call
the function with a specified order, where positive order represents derivatives
and negative order represents definite integrals.
We also allow user to specify the way they want to extrapolate by
matching the smoothness at the end knots. This is achieved by a class method
of XFunction
called append
that will slice two instances of XFunction
together.
With all of the above features, we created a easy to use spline package for statistical model building, which has been widely used in global health statistical analysis, see references below. For more examples please check here.
More information about the structure of the library can be found in documentation, while the mathematical use cases are extensively discussed in [@zheng2021trimmed] and [@zheng2022burden] in the context of fitting nonlinear dose-response relationships.
The xspline
package is widely used in all spline modeling done at IHME. In paricular, the new functionality described above enabled a new set of dose-response analyses recently published by the institue, including analyses of chewing tobacco [@gil2024health], education [@balaj2024effects], second-hand smoke [@flor2024health], intimate partner violence [@spencer2023health], smoking [@dai2022health], blood pressure [@razo2022effects], vegetable consumption [@stanaway2022health], and red meat consumption [@lescinsky2022health]. The results of all of these analyses are now publicly available at https://vizhub.healthdata.org/burden-of-proof/.