From cc97d3cdc26ca71f88e2f468cd7a623485e5052a Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Sun, 1 Dec 2024 21:53:07 +0000 Subject: [PATCH] build based on 7b45856 --- dev/.documenter-siteinfo.json | 2 +- dev/index.html | 2 +- dev/lib/internal_methods/index.html | 4 +- dev/lib/public_methods/index.html | 4 +- .../example/{10550602.svg => 54ee09a6.svg} | 124 +++++++++--------- dev/man/example/index.html | 2 +- dev/man/installation/index.html | 2 +- 7 files changed, 70 insertions(+), 70 deletions(-) rename dev/man/example/{10550602.svg => 54ee09a6.svg} (84%) diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 2cb6ee2..8dafc92 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.9.4","generation_timestamp":"2024-11-20T21:43:37","documenter_version":"1.7.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.9.4","generation_timestamp":"2024-12-01T21:53:03","documenter_version":"1.7.0"}} \ No newline at end of file diff --git a/dev/index.html b/dev/index.html index 49d4dae..a21cd04 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,2 +1,2 @@ -Home · HighDimMixedModels.jl

HighDimMixedModels.jl

HighDimMixedModels.jl is a package for fitting regularized linear mixed-effect models to high dimensional, clustered data. It is a Julia implementation of the estimation approach in

Schelldorfer, J., Bühlmann, P., & DE GEER, S. V. (2011). Estimation for high‐dimensional linear mixed‐effects models using ℓ1‐penalization. Scandinavian Journal of Statistics, 38(2), 197-214.

Two options for penalties are provided, the original LASSO and the smoothly clipped absolute deviation (SCAD) penalty described in

Ghosh, A., & Thoresen, M. (2018). Non-concave penalization in linear mixed-effect models and regularized selection of fixed effects. AStA Advances in Statistical Analysis, 102, 179-210.

+Home · HighDimMixedModels.jl

HighDimMixedModels.jl

HighDimMixedModels.jl is a package for fitting regularized linear mixed-effect models to high dimensional, clustered data. It is a Julia implementation of the estimation approach in

Schelldorfer, J., Bühlmann, P., & DE GEER, S. V. (2011). Estimation for high‐dimensional linear mixed‐effects models using ℓ1‐penalization. Scandinavian Journal of Statistics, 38(2), 197-214.

Two options for penalties are provided, the original LASSO and the smoothly clipped absolute deviation (SCAD) penalty described in

Ghosh, A., & Thoresen, M. (2018). Non-concave penalization in linear mixed-effect models and regularized selection of fixed effects. AStA Advances in Statistical Analysis, 102, 179-210.

diff --git a/dev/lib/internal_methods/index.html b/dev/lib/internal_methods/index.html index d74d1b7..8e0cb3b 100644 --- a/dev/lib/internal_methods/index.html +++ b/dev/lib/internal_methods/index.html @@ -1,5 +1,5 @@ -Internal Methods and Types · HighDimMixedModels.jl
HighDimMixedModels.L_update!Method
L_update!(L::Matrix, XGgrp, ygrp, Zgrp, β, σ², coords, control)

Update of entry (coords[1], coords[2]) of matrix L, the lower traiangular Choelsky factor of the random effects covariance matrix

source
HighDimMixedModels.L_update!Method
L_update!(L::AbstractVector, XGgrp, ygrp, Zgrp, β, σ², s, control)

Update of coordinate s of L for diagonal covariance structure

source
HighDimMixedModels.armijo!Method
armijo!(
+Internal Methods and Types · HighDimMixedModels.jl
HighDimMixedModels.L_update!Method
L_update!(L::Matrix, XGgrp, ygrp, Zgrp, β, σ², coords, control)

Update of entry (coords[1], coords[2]) of matrix L, the lower traiangular Choelsky factor of the random effects covariance matrix

source
HighDimMixedModels.L_update!Method
L_update!(L::AbstractVector, XGgrp, ygrp, Zgrp, β, σ², s, control)

Update of coordinate s of L for diagonal covariance structure

source
HighDimMixedModels.armijo!Method
armijo!(
     XGgrp,
     ygrp,
     invVgrp,
@@ -15,4 +15,4 @@
     fct_old,
     arm_con,
     control,
-)

Performs Armijo line search to update jth component of β, i.e. β[j]

source
HighDimMixedModels.get_negllMethod
get_negll(invVgrp, ygrp, XGgrp, β)

Calculate the negative log-likelihod -l(ϕ̃) = -l(β, θ, σ²) = .5(N*log(2π) + log|V| + (y-xβ)'V⁻¹(y-Xβ))

Arguments

  • invVgrp :: Vector of length the number of groups, each of whose elements is the precision matrix of the responses within a group
  • ygrp :: Vector of vector of responses for each group
  • X :: Vector of fixed effect design matrices for each group
  • β :: Fixed effects
source
HighDimMixedModels.hessian_diag!Method
hessian_diag!(XGgrp, invVgrp, active_set, hess, mat_act)

Calculate active_set entries of the diagonal of Hessian matrix for fixed effect parameters

source
HighDimMixedModels.invV!Method
invV!(invVgrp, Zgrp, L, σ²)

Update precision matrices of the responses, by group, by modifying invVgrp in place.

Arguments

  • invVgrp :: Container for precision matrices of the responses, by group
  • Zgrp :: Container of random effects design matrices, by group
  • L :: Parameters for random effect covariance matrix (can be scalar, vector, or lower triangular matrix)
  • σ² :: Variance of error
source
HighDimMixedModels.special_quadMethod
special_quad(XG, y, β, j, invVgrp, XGgrp, grp)

Calculate (y-ỹ)' * invV * X[:,j], where ỹ are the fitted values if we ignored the jth column i.e. XG[:,Not(j)] * β[Not(j)] To improve perforamce, we calculate ỹ at once with the entire dataset. We then split into groups and calculate (y-ỹ)' * invV * X[:,j] for each group

source
+)

Performs Armijo line search to update jth component of β, i.e. β[j]

source
HighDimMixedModels.get_negllMethod
get_negll(invVgrp, ygrp, XGgrp, β)

Calculate the negative log-likelihod -l(ϕ̃) = -l(β, θ, σ²) = .5(N*log(2π) + log|V| + (y-xβ)'V⁻¹(y-Xβ))

Arguments

  • invVgrp :: Vector of length the number of groups, each of whose elements is the precision matrix of the responses within a group
  • ygrp :: Vector of vector of responses for each group
  • X :: Vector of fixed effect design matrices for each group
  • β :: Fixed effects
source
HighDimMixedModels.hessian_diag!Method
hessian_diag!(XGgrp, invVgrp, active_set, hess, mat_act)

Calculate active_set entries of the diagonal of Hessian matrix for fixed effect parameters

source
HighDimMixedModels.invV!Method
invV!(invVgrp, Zgrp, L, σ²)

Update precision matrices of the responses, by group, by modifying invVgrp in place.

Arguments

  • invVgrp :: Container for precision matrices of the responses, by group
  • Zgrp :: Container of random effects design matrices, by group
  • L :: Parameters for random effect covariance matrix (can be scalar, vector, or lower triangular matrix)
  • σ² :: Variance of error
source
HighDimMixedModels.special_quadMethod
special_quad(XG, y, β, j, invVgrp, XGgrp, grp)

Calculate (y-ỹ)' * invV * X[:,j], where ỹ are the fitted values if we ignored the jth column i.e. XG[:,Not(j)] * β[Not(j)] To improve perforamce, we calculate ỹ at once with the entire dataset. We then split into groups and calculate (y-ỹ)' * invV * X[:,j] for each group

source
diff --git a/dev/lib/public_methods/index.html b/dev/lib/public_methods/index.html index 72c32fb..e1eb0c4 100644 --- a/dev/lib/public_methods/index.html +++ b/dev/lib/public_methods/index.html @@ -1,4 +1,4 @@ -Public Methods and Types · HighDimMixedModels.jl
HighDimMixedModels.ControlType
Control

Hyperparameters for the coordinate descent algorithm

Fields

  • tol: Small positive number, default is 1e-4, providing convergence tolerance
  • seed: Random seed, default 770. Note that the only randomness in the algorithm is during the initialization of fixed effect parameters (for the data splits in the cross validation)
  • trace: Bool, default false. If true, prints cost and size of active set over the course of the algorithm.
  • max_iter: Integer, default 1000, giving maximum number of iterations in the coordinate gradient descent.
  • max_armijo: Integer, default 20, giving the maximum number of steps in the Armijo algorithm. If the maximum is reached, algorithm doesn't update current coordinate and proceeds to the next coordinate
  • act_num: Integer, default 5. We will only update all of the fixed effect parameters every act_num iterations. Otherwise, we update only the parameters in the current active set.
  • a₀: a₀ in the Armijo step, default 1.0. See Schelldorfer et al. (2010) for details about this and the next five fields.
  • δ: δ in the Armijo step, default 0.1.
  • ρ: ρ in the Armijo step, default 0.001.
  • γ: γ in the Armijo step, default 0.0.
  • lower: Lower bound for the Hessian, default 1e-6.
  • upper: Upper bound for the Hessian, default 1e8.
  • var_int: Tuple with bounds of interval on which to optimize when updating a diagonal entry of L, default (0, 100). See Optim.jl in section "minimizing a univariate function on a bounded interval"
  • cov_int: Tuple with bounds of interval on which to optimize the when updating a non-diagonal entry of L, default (-50, 50). See Optim.jl in section "minimizing a univariate function on a bounded interval"
  • optimize_method: Symbol denoting method for performing the univariate optimization, either :Brent or :GoldenSection, default is :Brent
  • thres: If an update to a diagonal entry of L is smaller than thres, the parameter is set to 0
source
HighDimMixedModels.HDMModelType
HDMModel

Results of a fitted model

Fields

  • data: NamedTuple containing the input data used for fitting the model
  • weights: Vector of penalty weights used in the model
  • init_coef: NamedTuple containing the initial coefficient values
  • init_log_like: Initial log-likelihood value
  • init_objective: Initial objective function value
  • init_nz: Number of non-zero components in the initial estimate of fixed effects
  • penalty: String indicating the type of penalty used in the model
  • standardize: Boolean indicating whether the input data was standardized
  • λ: Regularization hyperparameter
  • scada: Hyperparameter relevant to the scad penalty
  • σ²: Estimated variance parameter
  • L: Lower triangular matrix representing the Cholesky factor of the random effect covariance matrix
  • fixef: Vector of estimated fixed effects
  • ranef: vector of g vectors, each of length m, holding random effects BLUPs for each group
  • fitted: Vector of fitted values, including random effects
  • resid: Vector of residuals, including random effects
  • log_like: Log-likelihood value at convergence
  • objective: Objective function value at convergence
  • npar: Total number of parameters in the model
  • nz: Number of non-zero fixed effects
  • deviance: Deviance value
  • num_arm: Number of times armijo! needed to be called
  • arm_con: Number of times the Armijo algorithm failed to converge
  • aic: Akaike Information Criterion
  • bic: Bayesian Information Criterion
  • iterations: Number of iterations performed
  • ψstr: Assumed structure of the random effect covariance matrix
  • ψ: Estimated random effect covariance matrix, i.e. L * L'
  • control: Control object containing hyperparameters that were used for the coordinate descent algorithm
source
HighDimMixedModels.hdmmFunction
hdmm(X::Matrix{<:Real}, G::Matrix{<:Real}, y::Vector{<:Real}, 
+Public Methods and Types · HighDimMixedModels.jl
HighDimMixedModels.ControlType
Control

Hyperparameters for the coordinate descent algorithm

Fields

  • tol: Small positive number, default is 1e-4, providing convergence tolerance
  • seed: Random seed, default 770. Note that the only randomness in the algorithm is during the initialization of fixed effect parameters (for the data splits in the cross validation)
  • trace: Bool, default false. If true, prints cost and size of active set over the course of the algorithm.
  • max_iter: Integer, default 1000, giving maximum number of iterations in the coordinate gradient descent.
  • max_armijo: Integer, default 20, giving the maximum number of steps in the Armijo algorithm. If the maximum is reached, algorithm doesn't update current coordinate and proceeds to the next coordinate
  • act_num: Integer, default 5. We will only update all of the fixed effect parameters every act_num iterations. Otherwise, we update only the parameters in the current active set.
  • a₀: a₀ in the Armijo step, default 1.0. See Schelldorfer et al. (2010) for details about this and the next five fields.
  • δ: δ in the Armijo step, default 0.1.
  • ρ: ρ in the Armijo step, default 0.001.
  • γ: γ in the Armijo step, default 0.0.
  • lower: Lower bound for the Hessian, default 1e-6.
  • upper: Upper bound for the Hessian, default 1e8.
  • var_int: Tuple with bounds of interval on which to optimize when updating a diagonal entry of L, default (0, 100). See Optim.jl in section "minimizing a univariate function on a bounded interval"
  • cov_int: Tuple with bounds of interval on which to optimize the when updating a non-diagonal entry of L, default (-50, 50). See Optim.jl in section "minimizing a univariate function on a bounded interval"
  • optimize_method: Symbol denoting method for performing the univariate optimization, either :Brent or :GoldenSection, default is :Brent
  • thres: If an update to a diagonal entry of L is smaller than thres, the parameter is set to 0
source
HighDimMixedModels.HDMModelType
HDMModel

Results of a fitted model

Fields

  • data: NamedTuple containing the input data used for fitting the model
  • weights: Vector of penalty weights used in the model
  • init_coef: NamedTuple containing the initial coefficient values
  • init_log_like: Initial log-likelihood value
  • init_objective: Initial objective function value
  • init_nz: Number of non-zero components in the initial estimate of fixed effects
  • penalty: String indicating the type of penalty used in the model
  • standardize: Boolean indicating whether the input data was standardized
  • λ: Regularization hyperparameter
  • scada: Hyperparameter relevant to the scad penalty
  • σ²: Estimated variance parameter
  • L: Lower triangular matrix representing the Cholesky factor of the random effect covariance matrix
  • fixef: Vector of estimated fixed effects
  • ranef: vector of g vectors, each of length m, holding random effects BLUPs for each group
  • fitted: Vector of fitted values, including random effects
  • resid: Vector of residuals, including random effects
  • log_like: Log-likelihood value at convergence
  • objective: Objective function value at convergence
  • npar: Total number of parameters in the model
  • nz: Number of non-zero fixed effects
  • deviance: Deviance value
  • num_arm: Number of times armijo! needed to be called
  • arm_con: Number of times the Armijo algorithm failed to converge
  • aic: Akaike Information Criterion
  • bic: Bayesian Information Criterion
  • iterations: Number of iterations performed
  • ψstr: Assumed structure of the random effect covariance matrix
  • ψ: Estimated random effect covariance matrix, i.e. L * L'
  • control: Control object containing hyperparameters that were used for the coordinate descent algorithm
source
HighDimMixedModels.hdmmFunction
hdmm(X::Matrix{<:Real}, G::Matrix{<:Real}, y::Vector{<:Real}, 
      grp::Vector{<:Union{String, Int64}}, Z::Matrix{<:Real}=X; 
-     <keyword arguments>)

Fit a penalized linear mixed effect model using the coordinate gradient descent (CGD) algorithm and return a fitted model of type HDMModel.

Arguments

  • X: Low dimensional (N by q) design matrix for unpenalized fixed effects (first column must be all 1's to fit intercept)
  • G: High dimensional (N by p) design matrix for penalized fixed effects (should not include column of 1's)
  • y: Length N response vector
  • grp: Length N vector with group assignments of each observation
  • Z=X: Random effects design matrix (N by m), should contain some subset of the columns of X (defaults to equal X)

Keyword:

  • penalty::String="scad": Either "scad" or "lasso"
  • standardize::Bool=true: Whether to standardize the columns of all design matrices before performing coordinate descent. The value of λ (and wts) should be chosen accordingly. Estimates will be returned to the original scale at the end.
  • λ::Real=10.0: Positive number providing the regularization parameter for the penalty
  • wts::Union{Vector,Nothing}=nothing: If specified, the penalty on covariate j will be λ/wⱼ, so this argument is useful if you want to penalize some covariates more than others.
  • scada::Real=3.7: Positive number providing the extra tuning parameter for the scad penalty (ignored for lasso)
  • max_active::Real=length(y)/2: Maximum number of fixed effects estimated non-zero (defaults to half the total sample size)
  • ψstr::String="diag": One of "ident", "diag", or "sym", specifying the structure of the random effects' covariance matrix
  • init_coef::Union{Tuple,Nothing} = nothing: If specified, provides the initialization to the algorithm. See notes below for more details
  • control::Control = Control(): Custom struct with hyperparameters of the CGD algorithm, defaults are in documentation of Control struct

Notes

The initialization to the descent algorithm can be specified in the init_coef argument as a tuple of the form (β, L, σ²), where

  • β is a vector of length p + q providing an initial estimate of the fixed effect coefficients
  • L is the Cholesky factor of the random effect covariance matrix, and is represented as
    • a scalar if ψstr="ident"
    • a vector of length m if ψstr="diag"
    • a lower triangular matrix of size m by m if ψstr="sym"
  • σ² is a scalar providing an initial estimate of the noise variance

If the init_coef argument is not specified, we obtain initial parameter estimates in the following manner:

  1. A LASSO that ignores random effects (with λ chosen using cross validation) is performed to estimate the fixed effect parameters.
  2. L, assumed temporarilly to be a scalar, and σ² are estimated to maximize the likelihood given these estimated fixed effect parameters.
  3. If ψstr is "diag" or "sym", the scalar L is converted to a vector or matrix by repeating the scalar or filling the diagonal of a matrix with the scalar, respectively.
source
StatsAPI.coeftableFunction
coeftable(fit::HDMModel, names::Vector{String}=string.(1:length(fit.fixef)))

Return a table of the selected coefficients, i.e. those not set to 0, from the model.

Arguments

  • fit::HDMModel: A fitted model.
  • names::Vector{String}: Names of the all the coefficients in the model (not just those selected), defaults to integer names

Returns

A StatsBase.CoefTable object.

source
StatsAPI.fittedMethod
fitted(fit::HDMModel)

Accounts for the random effects in generating predictions

source
+ <keyword arguments>)

Fit a penalized linear mixed effect model using the coordinate gradient descent (CGD) algorithm and return a fitted model of type HDMModel.

Arguments

  • X: Low dimensional (N by q) design matrix for unpenalized fixed effects (first column must be all 1's to fit intercept)
  • G: High dimensional (N by p) design matrix for penalized fixed effects (should not include column of 1's)
  • y: Length N response vector
  • grp: Length N vector with group assignments of each observation
  • Z=X: Random effects design matrix (N by m), should contain some subset of the columns of X (defaults to equal X)

Keyword:

  • penalty::String="scad": Either "scad" or "lasso"
  • standardize::Bool=true: Whether to standardize the columns of all design matrices before performing coordinate descent. The value of λ (and wts) should be chosen accordingly. Estimates will be returned to the original scale at the end.
  • λ::Real=10.0: Positive number providing the regularization parameter for the penalty
  • wts::Union{Vector,Nothing}=nothing: If specified, the penalty on covariate j will be λ/wⱼ, so this argument is useful if you want to penalize some covariates more than others.
  • scada::Real=3.7: Positive number providing the extra tuning parameter for the scad penalty (ignored for lasso)
  • max_active::Real=length(y)/2: Maximum number of fixed effects estimated non-zero (defaults to half the total sample size)
  • ψstr::String="diag": One of "ident", "diag", or "sym", specifying the structure of the random effects' covariance matrix
  • init_coef::Union{Tuple,Nothing} = nothing: If specified, provides the initialization to the algorithm. See notes below for more details
  • control::Control = Control(): Custom struct with hyperparameters of the CGD algorithm, defaults are in documentation of Control struct

Notes

The initialization to the descent algorithm can be specified in the init_coef argument as a tuple of the form (β, L, σ²), where

  • β is a vector of length p + q providing an initial estimate of the fixed effect coefficients
  • L is the Cholesky factor of the random effect covariance matrix, and is represented as
    • a scalar if ψstr="ident"
    • a vector of length m if ψstr="diag"
    • a lower triangular matrix of size m by m if ψstr="sym"
  • σ² is a scalar providing an initial estimate of the noise variance

If the init_coef argument is not specified, we obtain initial parameter estimates in the following manner:

  1. A LASSO that ignores random effects (with λ chosen using cross validation) is performed to estimate the fixed effect parameters.
  2. L, assumed temporarilly to be a scalar, and σ² are estimated to maximize the likelihood given these estimated fixed effect parameters.
  3. If ψstr is "diag" or "sym", the scalar L is converted to a vector or matrix by repeating the scalar or filling the diagonal of a matrix with the scalar, respectively.
source
StatsAPI.coeftableFunction
coeftable(fit::HDMModel, names::Vector{String}=string.(1:length(fit.fixef)))

Return a table of the selected coefficients, i.e. those not set to 0, from the model.

Arguments

  • fit::HDMModel: A fitted model.
  • names::Vector{String}: Names of the all the coefficients in the model (not just those selected), defaults to integer names

Returns

A StatsBase.CoefTable object.

source
StatsAPI.fittedMethod
fitted(fit::HDMModel)

Accounts for the random effects in generating predictions

source
diff --git a/dev/man/example/10550602.svg b/dev/man/example/54ee09a6.svg similarity index 84% rename from dev/man/example/10550602.svg rename to dev/man/example/54ee09a6.svg index a9c3751..039b70a 100644 --- a/dev/man/example/10550602.svg +++ b/dev/man/example/54ee09a6.svg @@ -1,76 +1,76 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/man/example/index.html b/dev/man/example/index.html index 003c3e8..8a90977 100644 --- a/dev/man/example/index.html +++ b/dev/man/example/index.html @@ -127,4 +127,4 @@ plot!(cog_df.year[mask], cog_df.ravens[mask], seriestype = :scatter, label = "student $i", color = i) plot!(cog_df.year[mask], fitted(fit)[mask], seriestype = :line, color = i, linestyle = :dash, linewidth = 3, label = "") end -plot!(legend=:outerbottom, legendcolumns=3, xlabel = "Year", ylabel = "Ravens Score")Example block output +plot!(legend=:outerbottom, legendcolumns=3, xlabel = "Year", ylabel = "Ravens Score")Example block output diff --git a/dev/man/installation/index.html b/dev/man/installation/index.html index d6524cc..c3b7dec 100644 --- a/dev/man/installation/index.html +++ b/dev/man/installation/index.html @@ -1,3 +1,3 @@ Installation · HighDimMixedModels.jl

Installation

Installation of Julia

Julia is a high-level and interactive programming language (like R or Matlab), but it is also high-performance (like C). To install Julia, follow instructions here. For a quick & basic tutorial on Julia, see learn x in y minutes.

Editors:

  • Visual Studio Code provides an editor and an integrated development environment (IDE) for Julia: highly recommended!
  • You can also run Julia within a Jupyter notebook (formerly IPython notebook).

IMPORTANT: Julia code is just-in-time compiled. This means that the first time you run a function, it will be compiled at that moment. So, please be patient! Future calls to the function will be much much faster. Trying out toy examples for the first calls is a good idea.

Installation of the package

To install HighDimMixedModels, type in the Julia REPL

]
-add HighDimMixedModels

The installation can take a few minutes, be patient. The package has dependencies such as Optim.jl and Lasso.jl (see the Project.toml file for the full list), but everything is installed automatically.

+add HighDimMixedModels

The installation can take a few minutes, be patient. The package has dependencies such as Optim.jl and Lasso.jl (see the Project.toml file for the full list), but everything is installed automatically.