Releases | Reporting Issues | Blogpost
MRFcov
(described by Clark et al, published in Ecology Statistical Reports) provides R
functions for approximating interaction parameters of nodes in undirected Markov Random Fields (MRF) graphical networks. Models can incorporate covariates (a class of models known as Conditional Random Fields; CRFs; following methods developed by Cheng et al 2014 and Lindberg 2016), allowing users to estimate how interactions between nodes in the graph are predicted to change across covariate gradients.
In principle, MRFcov
models that use species' occurrences or abundances as outcome variables are similar to Joint Species Distribution models in that variance can be partitioned among abiotic and biotic drivers. However, key differences are that MRFcov
models can:
-
Produce directly interpretable coefficients that allow users to determine the relative importances (i.e. effect sizes) of species' interactions and environmental covariates in driving abundanecs or occurrence probabilities
-
Identify interaction strengths, rather than simply determining whether they are "significantly different from zero"
-
Estimate how interactions are predicted to change across environmental gradients
MRF and CRF interaction parameters are approximated using separate regressions for individual species within a joint modelling framework. Because all combinations of covariates and additional species are included as predictor variables in node-specific regressions, variable selection is required to reduce overfitting and add sparsity. This is accomplished through LASSO penalization using functions in the penalized and glmnet packages.
You can install the MRFcov
package into R
directly from GitHub
using:
# install.packages("devtools")
devtools::install_github("nicholasjclark/MRFcov")
We can explore the model's primary functions using a test dataset that is available with the package. Load the Bird.parasites
dataset, which contains binary occurrences of four avian blood parasites in New Caledonian Zosterops species (available in its original form at Dryad; Clark et al 2016). A single continuous covariate is also included (scale.prop.zos
), which reflects the relative abundance of Zosterops species among different sample sites
library(MRFcov)
data("Bird.parasites")
Visualise the dataset to see how analysis data needs to be structured. In short, when estimating co-occurrence probabilities, node variable (i.e. species) occurrences should be included as binary variables (1s and 0s) as the left-most variables in data
. Any covariates can be included as the right-most variables. Note, these covariates should ideally be on a similar scale, using the scale
function for continuous covariates (or similar) so that covariates generally have mean = 0
and sd = 1
help("Bird.parasites")
View(Bird.parasites)
You can read more about specific requirements of data formats (for example, one-hot encoding of categorical covariates) in the supplied vignette
vignette("CRF_data_prep")
Run an MRF model using the provided continuous covariate (scale.prop.zos
). Here we allow the species-specific regressions to be individually optimised through cross-validated LASSO regressions (the default option when no lambda1
regularization value is specified). This will produce a warning for reassurance
MRF_mod <- MRFcov(data = Bird.parasites, n_nodes = 4, family = 'binomial')
#> Warning in MRFcov(data = Bird.parasites, n_nodes = 4, family = "binomial"): fixed_lambda not provided. Cross-validated optimisation will commence by default,
#> ignoring lambda1
Visualise the estimated species interaction coefficients as a heatmap. These represent mean interactions and are very useful for identifying co-occurrence patterns, but they do not indicate how interactions change across gradients. Note, for binary data such as this, we can also plot the observed occurrences and co-occurrences using plot_observed_vals = TRUE
plotMRF_hm(MRF_mod, plot_observed_vals = TRUE, data = Bird.parasites)
For more in-depth visualisation, we can plot how species interactions are predicted to change across covariate magnitudes
plotMRF_hm_cont(MRF_mod = MRF_mod, covariate = 'scale.prop.zos', data = Bird.parasites,
main = 'Estimated interactions across host relative densities')
Finally, we can explore regression coefficients to get a better understanding of just how important interactions are for predicting species' occurrence probabilities (in comparison to other covariates). This is perhaps the strongest property of conditional MRFs, as competing methods (such as Joint Species Distribution Models) do not provide interpretable mechanisms for comparing the relative importances of interactions and fixed covariates. MRF functions conveniently return a matrix of important coefficients for each node in the graph, as well as their relative importances (calculated using the formula B^2 / sum(B^2)
, where the vector of B
s represents regression coefficients for predictor variables). Variables with an underscore (_
) indicate an interaction between a covariate and another node, suggesting that conditional dependencies of the two nodes vary across environmental gradients
MRF_mod$key_coefs$Hzosteropis
#> Variable Rel_importance Standardised_coef Raw_coef
#> 1 Hkillangoi 0.66531864 -2.3064326 -2.3064326
#> 5 scale.prop.zos_Microfilaria 0.12266333 -0.9903377 -0.9903377
#> 3 Microfilaria 0.10575006 0.9195307 0.9195307
#> 4 scale.prop.zos 0.09101689 -0.8530744 -0.8530744
#> 2 Plas 0.01244536 -0.3154493 -0.3154493
MRF_mod$key_coefs$Hkillangoi
#> Variable Rel_importance Standardised_coef Raw_coef
#> 1 Hzosteropis 0.76638826 -2.3064326 -2.3064326
#> 2 Microfilaria 0.13876154 -0.9814109 -0.9814109
#> 3 scale.prop.zos 0.09482683 -0.8113009 -0.8113009
MRF_mod$key_coefs$Plas
#> Variable Rel_importance Standardised_coef Raw_coef
#> 2 Microfilaria 0.64897980 1.5278142 1.5278142
#> 3 scale.prop.zos 0.26991953 -0.9853082 -0.9853082
#> 4 scale.prop.zos_Microfilaria 0.04715223 0.4118187 0.4118187
#> 1 Hzosteropis 0.02766618 -0.3154493 -0.3154493
MRF_mod$key_coefs$Microfilaria
#> Variable Rel_importance Standardised_coef Raw_coef
#> 3 Plas 0.35668971 1.5278142 1.5278142
#> 4 scale.prop.zos 0.19113755 -1.1184028 -1.1184028
#> 5 scale.prop.zos_Hzosteropis 0.14987048 -0.9903377 -0.9903377
#> 2 Hkillangoi 0.14718085 -0.9814109 -0.9814109
#> 1 Hzosteropis 0.12920579 0.9195307 0.9195307
#> 6 scale.prop.zos_Plas 0.02591562 0.4118187 0.4118187
To work through more in-depth tutorials and examples, see the vignettes in the package and check out papers that have been published using the method
vignette("Bird_Parasite_CRF")
Clark et al 2018 Ecology | PDF
Cheng, J., Levina, E., Wang, P. & Zhu, J. (2014). A sparse Ising model with covariates. Biometrics 70:943-953.
Clark, N.J., Wells, K., Lindberg, O. (2018). Unravelling changing interspecific interactions across environmental gradients using Markov random fields. Ecology DOI: https://doi.org/10.1002/ecy.2221
Clark, N.J., K. Wells, D. Dimitrov, and S.M. Clegg. (2016). Co-infections and environmental conditions drive the distributions of blood parasites in wild birds. Journal of Animal Ecology 85:1461-1470. Blogpost | PDF
Lindberg, O. (2016). Markov Random Fields in Cancer Mutation Dependencies. Master's of Science Thesis. University of Turku, Turku, Finland.
This project is licensed under the terms of the GNU General Public License (GNU GPLv3)