From 011951468c3deb97a6a6204b3f55245807e15127 Mon Sep 17 00:00:00 2001 From: Harrison Ritz <harrison.ritz@gmail.com> Date: Wed, 4 Dec 2024 15:58:47 -0500 Subject: [PATCH 01/11] update project & readme --- Project.toml | 2 +- README.md | 170 ++++++++++++++++++++++++--------------------------- 2 files changed, 81 insertions(+), 91 deletions(-) diff --git a/Project.toml b/Project.toml index fefb0a8..d243f5f 100644 --- a/Project.toml +++ b/Project.toml @@ -47,7 +47,7 @@ SpecialFunctions = "2.4.0" StatsBase = "0.34.3" StatsFuns = "1.3.2" Test = "1.11" -julia = ">=1.11.0" +julia = "1.11.0" [extras] Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40" diff --git a/README.md b/README.md index bd7bd7d..a0f530b 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # StateSpaceAnalysis.jl -[![Build Status](https://github.com/harrisonritz/StateSpaceAnalysis.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/harrisonritz/StateSpaceAnalysis.jl/actions/workflows/CI.yml?query=branch%3Amain) +[![Build Status](https://github.com/harrisonritz/StateSpaceAnalysis.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/harrisonritz/StateSpaceAnalysis.jl/actions/workflows/CI.yml?query=branch%3Amain) [![Aqua QA](https://raw.githubusercontent.com/JuliaTesting/Aqua.jl/master/badge.svg)](https://github.com/JuliaTesting/Aqua.jl) ## Overview @@ -105,97 +105,87 @@ This will install all the necessary dependencies and set up the StateSpaceAnalys -## Running the Example -To run the example fitting script, follow these steps: +## Walkthrough of the `example/fit_example.jl` script -1. Set the paths in `example/fit_example.jl`: - ```julia - run_cluster = length(ARGS)!=0; - if run_cluster - src_path = "/home/hr0283/HallM_StateSpaceAnalysis/src" - save_path = "/scratch/gpfs/hr0283/HallM_StateSpaceAnalysis/src"; - else - src_path = "/Users/hr0283/Projects/StateSpaceAnalysis.jl/src" - save_path = "/Users/hr0283/Projects/StateSpaceAnalysis.jl/example"; - end - ``` - -2. Load the necessary packages and configure the system: - ```julia - using StateSpaceAnalysis - using Accessors - using Random - using LinearAlgebra - using Dates - using Revise - ``` - -3. Set the parameters and data structure: - ```julia - S = core_struct( +### Set up `S`, the core structure which carries the parameters and data structure +```julia +S = core_struct( prm=param_struct( - seed = rand_seed, - model_name = "test", - changelog = "run test", - load_name = "HallMcMaster2019_ITI100-Cue200-ISI400-Trial200_srate@125_filt@0-30", - load_path = "/Users/hr0283/Projects/StateSpaceAnalysis.jl/example/example-data", - pt_list = 1:1, - max_iter_em = 500, - ssid_fit = "fit", - ssid_save = false, - ssid_type = :CVA, - ssid_lag = 24, - ), + ... # high-level parameters + ), dat=data_struct( - sel_event = 2:4, - pt = 1, - x_dim = 24, - basis_name = "bspline", - spline_gap = 5, + ... # data and data description + ), + res=results_struct( + ... # fit metrics and model derivates ), - res=results_struct(), - est=estimates_struct(), - mdl=model_struct(), - ); - ``` - -4. Run the fitting process: - ```julia - @reset S.res.startTime_all = Dates.format(now(), "mm/dd/yyyy HH:MM:SS"); - println("Starting fit at $(S.res.startTime_all)") - - @reset S = StateSpaceAnalysis.preprocess_fit(S); - - if S.prm.ssid_fit == "fit" - @reset S = StateSpaceAnalysis.launch_SSID(S); - elseif S.prm.ssid_fit == "load" - @reset S = StateSpaceAnalysis.load_SSID(S); - end - - @reset S = StateSpaceAnalysis.launch_EM(S); - - @reset S.res.endTime_all = Dates.format(now(), "mm/dd/yyyy HH:MM:SS"); - println("Finished fit at $(Dates.format(now(), "mm/dd/yyyy HH:MM:SS"))") - ``` - -5. Optionally, plot diagnostics and save the fit: - ```julia - do_plots = false - if do_plots - try - StateSpaceAnalysis.plot_loglik_traces(S) - StateSpaceAnalysis.plot_avg_pred(S) - StateSpaceAnalysis.plot_params(S) - catch - end - end - - if S.prm.do_save - println("\n========== SAVING FIT ==========") - StateSpaceAnalysis.save_results(S) - end - ``` - -This will run the example fitting script, performing SSID and EM fitting on the provided data. - + est=estimates_struct( + ... # scratch space + ), + mdl=model_struct( + ... # estimated model parameters + ), + ); +``` +This structure is used throughout the script, which allows for effective memory management (i.e., the complier can know the size of the data tensors). + +### Preprocess the data: +```julia +@reset S = StateSpaceAnalysis.preprocess_fit(S); +``` + +within preprocess_fit(S): + +```julia +# read in arguements, helpful for running on a cluster +S = deepcopy(StateSpaceAnalysis.read_args(S, ARGS)); +# set up the paths +StateSpaceAnalysis.setup_path(S) +# load and format the data; split for cross-validation +S = deepcopy(StateSpaceAnalysis.load_data(S)); +# build the input matrices +S = deepcopy(StateSpaceAnalysis.build_inputs(S)); +# transform the observed data +S = deepcopy(StateSpaceAnalysis.whiten(S)); +# fit baseline models to the data +StateSpaceAnalysis.null_loglik!(S); +# initialize the expectations and parameters +@reset S.est = deepcopy(set_estimates(S)); +@reset S = deepcopy(gen_rand_params(S)); +``` + +### get the initial conditions with Subspace Identification (SSID): +```julia +if S.prm.ssid_fit == "fit" # if fitting the SSID + @reset S = StateSpaceAnalysis.launch_SSID(S); +elseif S.prm.ssid_fit == "load" # if loading a previously-fit SSID + @reset S = StateSpaceAnalysis.load_SSID(S); +end +``` + +### get the initial conditions with Subspace Identification (SSID): +```julia +@reset S = StateSpaceAnalysis.launch_EM(S); +``` +The basic structure of the EM script: +```julia +for em_iter = 1:S.prm.max_iter_em + + # ==== E-STEP ================================================================ + @inline StateSpaceAnalysis.ESTEP!(S); # estimate the sufficient statistics + + # ==== M-STEP ================================================================ + @reset S.mdl = deepcopy(StateSpaceAnalysis.MSTEP(S)); # use the sufficient statistics to update the parameters + + # ==== TOTAL LOGLIK ========================================================== + StateSpaceAnalysis.total_loglik!(S) # compute the total likelihood + + # quality checks & convergence checks +end +``` + +### Save the fit: +```julia +StateSpaceAnalysis.save_results(S) +``` From 18bdc00f8bed90afad528ef2e591f7a0ab7ce718 Mon Sep 17 00:00:00 2001 From: Harrison Ritz <harrison.ritz@gmail.com> Date: Wed, 4 Dec 2024 16:00:25 -0500 Subject: [PATCH 02/11] spacing --- README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/README.md b/README.md index a0f530b..ac47774 100644 --- a/README.md +++ b/README.md @@ -139,16 +139,22 @@ within preprocess_fit(S): ```julia # read in arguements, helpful for running on a cluster S = deepcopy(StateSpaceAnalysis.read_args(S, ARGS)); + # set up the paths StateSpaceAnalysis.setup_path(S) + # load and format the data; split for cross-validation S = deepcopy(StateSpaceAnalysis.load_data(S)); + # build the input matrices S = deepcopy(StateSpaceAnalysis.build_inputs(S)); + # transform the observed data S = deepcopy(StateSpaceAnalysis.whiten(S)); + # fit baseline models to the data StateSpaceAnalysis.null_loglik!(S); + # initialize the expectations and parameters @reset S.est = deepcopy(set_estimates(S)); @reset S = deepcopy(gen_rand_params(S)); From 9fd8f6d306fb14168234235914fd1a15be652221 Mon Sep 17 00:00:00 2001 From: Harrison Ritz <harrison.ritz@gmail.com> Date: Wed, 4 Dec 2024 16:03:18 -0500 Subject: [PATCH 03/11] add to overview --- README.md | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index ac47774..2ab1fb2 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,15 @@ ## Overview -StateSpaceAnalysis.jl is a Julia package designed for fitting linear Gaussian state space models (lg-SSMs) using Subspace System Identification (SSID) and Expectation Maximization (EM) algorithms. This package provides tools for preprocessing data, fitting models, and evaluating model performance. +StateSpaceAnalysis.jl is a Julia package designed for fitting linear-Gaussian state space models (SSMs) using Subspace System Identification (SSID) and Expectation Maximization (EM) algorithms. + +This package provides tools for preprocessing data, fitting models, and evaluating model performance, with methods especially geared towards neuroimaging analysis: + +* event-related structure of Neuroimaging data. EEG/MEG often has batched sequences (e.g., states x timesteps x trials). We are custom-built for that case by (A) including spline bases for inputs and (B) re-using the filtered/smoothed covariance across batches to massive reduce compute time. + +* high-dimensional systems. We are designed around scaling through efficient memory allocation, robust covariance handling (via PDMats.jl), and regularization. This has allowed this packaged to work well for high-dimensional state space models (e.g., factor dim > observation dim). + +* data-driven initialization. Because using SSMs for task-based neuroimaging is relatively new, it is difficult to provide good initializations for state space models. My packages includes modified SSID functions from ControlSystemsAnalysis.jl (appropriately credited and consistent with their license -- incredibly grateful!). By use SSID, we get really good initializations, which are critical for high-dimensional SSMs. ## Installation @@ -160,7 +168,7 @@ StateSpaceAnalysis.null_loglik!(S); @reset S = deepcopy(gen_rand_params(S)); ``` -### get the initial conditions with Subspace Identification (SSID): +### Warm-start the EM with initial parameters from Subspace Identification (SSID): ```julia if S.prm.ssid_fit == "fit" # if fitting the SSID @reset S = StateSpaceAnalysis.launch_SSID(S); @@ -169,7 +177,7 @@ elseif S.prm.ssid_fit == "load" # if loading a previously-fit SSID end ``` -### get the initial conditions with Subspace Identification (SSID): +### Fit the parameters use EM: ```julia @reset S = StateSpaceAnalysis.launch_EM(S); ``` From b3d90d4a0f4de61e4de11d388ecabaa73aed97b0 Mon Sep 17 00:00:00 2001 From: Harrison Ritz <harrison.ritz@gmail.com> Date: Wed, 4 Dec 2024 17:05:28 -0500 Subject: [PATCH 04/11] formating readme --- README.md | 161 +++++++++++++++++++++++++++--------------------------- 1 file changed, 79 insertions(+), 82 deletions(-) diff --git a/README.md b/README.md index 2ab1fb2..39663ee 100644 --- a/README.md +++ b/README.md @@ -5,20 +5,23 @@ ## Overview -StateSpaceAnalysis.jl is a Julia package designed for fitting linear-Gaussian state space models (SSMs) using Subspace System Identification (SSID) and Expectation Maximization (EM) algorithms. +*StateSpaceAnalysis.jl* is a Julia package designed for fitting linear-Gaussian state space models (SSMs) using Subspace System Identification (SSID) and Expectation Maximization (EM) algorithms. -This package provides tools for preprocessing data, fitting models, and evaluating model performance, with methods especially geared towards neuroimaging analysis: +This package provides tools for preprocessing data, fitting models, and evaluating model performance, with methods especially tailored towards neuroimaging analysis: -* event-related structure of Neuroimaging data. EEG/MEG often has batched sequences (e.g., states x timesteps x trials). We are custom-built for that case by (A) including spline bases for inputs and (B) re-using the filtered/smoothed covariance across batches to massive reduce compute time. +**Event-related designs**: Neuroimaging data often has epoched/batched sequences (e.g., states x timesteps x trials). +*StateSpaceAnalysis.jl* handles epoched data by re-using computations across batches, and it includes spline bases for flexible input modeling over the epoch. -* high-dimensional systems. We are designed around scaling through efficient memory allocation, robust covariance handling (via PDMats.jl), and regularization. This has allowed this packaged to work well for high-dimensional state space models (e.g., factor dim > observation dim). +**High-dimensional Systems**: Whole-brain modelling may require a large number of latent factors. +*StateSpaceAnalysis.jl* handles scaling through efficient memory allocation, robust covariance formats (via *PDMats.jl*), and regularization. -* data-driven initialization. Because using SSMs for task-based neuroimaging is relatively new, it is difficult to provide good initializations for state space models. My packages includes modified SSID functions from ControlSystemsAnalysis.jl (appropriately credited and consistent with their license -- incredibly grateful!). By use SSID, we get really good initializations, which are critical for high-dimensional SSMs. +**Data-driven Initialization**: We need good initialization for systems where we don't have great domain knowledge (especially when there are many latent factors!). +*StateSpaceAnalysis.jl* handles parameter initialization through subspace identification methods from *ControlSystemsAnalysis.jl*. ## Installation -To install the StateSpaceAnalysis.jl package, follow these steps: +To install the *StateSpaceAnalysis.jl* package, follow these steps: 1. **Clone the repository:** ```sh @@ -41,78 +44,6 @@ To install the StateSpaceAnalysis.jl package, follow these steps: This will install all the necessary dependencies and set up the StateSpaceAnalysis.jl package for use. - -## Functions Overview - -### `setup/custom.jl` -**This needs to be set by the user for the project-specific parameters** -- `assign_arguments`: Assigns command-line arguments to the structure. -- `select_trials`: Selects trials based on custom criteria. -- `scale_input`: Scales the input data. -- `create_input_basis`: Formats inputs with basis functions. -- `transform_observations`: Transforms observations, typically using PCA. -- `format_B_preSSID`: Formats the B matrix for SSID. -- `format_B_postSSID`: Assigns the estimated B columns to the rest of the matrix. - -### `fit/launch.jl` -- `preprocess_fit`: Preprocesses the data and sets up the fitting environment. -- `launch_SSID`: Launches the SSID fitting process. -- `launch_EM`: Launches the EM fitting process. -- `load_SSID`: Loads a previously saved SSID model. -- `save_SSID`: Saves the SSID model. -- `save_results`: Saves the fitting results. - -### `fit/SSID.jl` -**These function are modifed from the excellent ControlSystemsIdentification.jl package** -- `fit_SSID`: Performs subspace identification for state space analysis. -- `subspaceid_SSA`: modified ControlSystemsIdentification.jl for SSID - -### `fit/EM.jl` -- `fit_EM`: Runs the EM algorithm for individual participants. -- `ESTEP!`: Executes the E-step of the EM algorithm. -- `MSTEP`: Executes the M-step of the EM algorithm. -- `estimate_cov!`: Estimates the latent covariance. -- `estimate_mean!`: Estimates the latent mean. -- `estimate_moments!`: update the sufficient statistics. - - -### `fit/posteriors.jl` -- `posterior_all`: Generates all posterior estimates (mean and covariance). -- `posterior_mean`: Generates only the posterior means. -- `posterior_sse`: Computes the sum of squared errors for the posteriors. - - -### `setup/setup.jl` -- `read_args`: process command line arguements (for running on the cluster) -- `setup_path`: Sets up the directory paths for saving results. -- `load_data`: Loads the data from files. -- `build_inputs`: Builds the input matrices for the model. -- `whiten`: Whitens the observations (PCA). - -### `setup/structs.jl` -- `param_struct`: Defines the parameters structure. -- `data_struct`: Defines the data structure. -- `results_struct`: Defines the results structure. -- `estimates_struct`: Defines the estimates structure. -- `model_struct`: Defines the model structure. -- `core_struct`: Combines all the structures into a core structure. -- `post_all`: Defines the structure for all posterior estimates. -- `post_mean`: Defines the structure for posterior means. -- `post_sse`: Defines the structure for posterior sum of squared errors. - -### `utils/utils.jl` -- `tol_PD`: Ensures a matrix is positive definite with a tolerance. -- `tol_PSD`: Ensures a matrix is positive semi-definite with a tolerance. -- `demix`: Demixes the observations using the saved PCA transformation. -- `remix`: Remixes the observations using the saved PCA transformation. - - - - - - - - ## Walkthrough of the `example/fit_example.jl` script ### Set up `S`, the core structure which carries the parameters and data structure @@ -141,9 +72,7 @@ This structure is used throughout the script, which allows for effective memory ```julia @reset S = StateSpaceAnalysis.preprocess_fit(S); ``` - -within preprocess_fit(S): - +Preprocessing steps within `preprocess_fit(S)`: ```julia # read in arguements, helpful for running on a cluster S = deepcopy(StateSpaceAnalysis.read_args(S, ARGS)); @@ -194,7 +123,7 @@ for em_iter = 1:S.prm.max_iter_em # ==== TOTAL LOGLIK ========================================================== StateSpaceAnalysis.total_loglik!(S) # compute the total likelihood - # quality checks & convergence checks + # [...] quality & convergence checks end ``` @@ -203,3 +132,71 @@ end StateSpaceAnalysis.save_results(S) ``` + + + +## Functions Overview + +### `setup/custom.jl` +**This needs to be set by the user for the project-specific parameters** +- `assign_arguments`: Assigns command-line arguments to the structure. +- `select_trials`: Selects trials based on custom criteria. +- `scale_input`: Scales the input data. +- `create_input_basis`: Formats inputs with basis functions. +- `transform_observations`: Transforms observations, typically using PCA. +- `format_B_preSSID`: Formats the B matrix for SSID. +- `format_B_postSSID`: Assigns the estimated B columns to the rest of the matrix. + +### `fit/launch.jl` +- `preprocess_fit`: Preprocesses the data and sets up the fitting environment. +- `launch_SSID`: Launches the SSID fitting process. +- `launch_EM`: Launches the EM fitting process. +- `load_SSID`: Loads a previously saved SSID model. +- `save_SSID`: Saves the SSID model. +- `save_results`: Saves the fitting results. + +### `fit/SSID.jl` +**These function are modifed from *ControlSystemsIdentification.jl*** +- `fit_SSID`: Performs subspace identification for state space analysis. +- `subspaceid_SSA`: modified ControlSystemsIdentification.jl for SSID + +### `fit/EM.jl` +- `fit_EM`: Runs the EM algorithm for individual participants. +- `ESTEP!`: Executes the E-step of the EM algorithm. +- `MSTEP`: Executes the M-step of the EM algorithm. +- `estimate_cov!`: Estimates the latent covariance. +- `estimate_mean!`: Estimates the latent mean. +- `estimate_moments!`: update the sufficient statistics. + + +### `fit/posteriors.jl` +- `posterior_all`: Generates all posterior estimates (mean and covariance). +- `posterior_mean`: Generates only the posterior means. +- `posterior_sse`: Computes the sum of squared errors for the posteriors. + + +### `setup/setup.jl` +- `read_args`: process command line arguements (for running on the cluster) +- `setup_path`: Sets up the directory paths for saving results. +- `load_data`: Loads the data from files. +- `build_inputs`: Builds the input matrices for the model. +- `whiten`: Whitens the observations (PCA). + +### `setup/structs.jl` +- `param_struct`: Defines the parameters structure. +- `data_struct`: Defines the data structure. +- `results_struct`: Defines the results structure. +- `estimates_struct`: Defines the estimates structure. +- `model_struct`: Defines the model structure. +- `core_struct`: Combines all the structures into a core structure. +- `post_all`: Defines the structure for all posterior estimates. +- `post_mean`: Defines the structure for posterior means. +- `post_sse`: Defines the structure for posterior sum of squared errors. + +### `utils/utils.jl` +- `tol_PD`: Ensures a matrix is positive definite with a tolerance. +- `tol_PSD`: Ensures a matrix is positive semi-definite with a tolerance. +- `demix`: Demixes the observations using the saved PCA transformation. +- `remix`: Remixes the observations using the saved PCA transformation. + + From 9f7189b2825436412e1ec808b8e10d13b3a32f39 Mon Sep 17 00:00:00 2001 From: Harrison Ritz <harrison.ritz@gmail.com> Date: Wed, 4 Dec 2024 17:36:27 -0500 Subject: [PATCH 05/11] hyperlinks --- README.md | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 39663ee..c1952ee 100644 --- a/README.md +++ b/README.md @@ -9,14 +9,20 @@ This package provides tools for preprocessing data, fitting models, and evaluating model performance, with methods especially tailored towards neuroimaging analysis: -**Event-related designs**: Neuroimaging data often has epoched/batched sequences (e.g., states x timesteps x trials). + +**Event-related designs**: Neuroimaging data often has epoched/batched sequences (e.g., states x timesteps x trials). + *StateSpaceAnalysis.jl* handles epoched data by re-using computations across batches, and it includes spline bases for flexible input modeling over the epoch. + **High-dimensional Systems**: Whole-brain modelling may require a large number of latent factors. -*StateSpaceAnalysis.jl* handles scaling through efficient memory allocation, robust covariance formats (via *PDMats.jl*), and regularization. + +*StateSpaceAnalysis.jl* handles scaling through efficient memory allocation, robust covariance formats (via [*PDMats.jl*](https://github.com/JuliaStats/PDMats.jl)), and regularization. + **Data-driven Initialization**: We need good initialization for systems where we don't have great domain knowledge (especially when there are many latent factors!). -*StateSpaceAnalysis.jl* handles parameter initialization through subspace identification methods from *ControlSystemsAnalysis.jl*. + +*StateSpaceAnalysis.jl* handles parameter initialization through subspace identification methods from [*ControlSystemsIdentification.jl*](https://github.com/baggepinnen/ControlSystemIdentification.jl). ## Installation @@ -156,7 +162,7 @@ StateSpaceAnalysis.save_results(S) - `save_results`: Saves the fitting results. ### `fit/SSID.jl` -**These function are modifed from *ControlSystemsIdentification.jl*** +**These function are modifed from [*ControlSystemsIdentification.jl*](https://github.com/baggepinnen/ControlSystemIdentification.jl)** - `fit_SSID`: Performs subspace identification for state space analysis. - `subspaceid_SSA`: modified ControlSystemsIdentification.jl for SSID From 6817d8f33de4842c4b5f53b86cbea1c27bface5e Mon Sep 17 00:00:00 2001 From: Harrison Ritz <harrison.ritz@gmail.com> Date: Wed, 4 Dec 2024 17:46:12 -0500 Subject: [PATCH 06/11] formatting & spacing readme --- README.md | 34 ++++++++++++++++++++++++++++++---- src/StateSpaceAnalysis.jl | 2 +- src/fit/wrapper.jl | 2 +- src/setup/generate.jl | 2 +- 4 files changed, 33 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index c1952ee..271b933 100644 --- a/README.md +++ b/README.md @@ -10,20 +10,28 @@ This package provides tools for preprocessing data, fitting models, and evaluating model performance, with methods especially tailored towards neuroimaging analysis: -**Event-related designs**: Neuroimaging data often has epoched/batched sequences (e.g., states x timesteps x trials). +### Event-related designs + +Neuroimaging data often has epoched/batched sequences (e.g., states x timesteps x trials). *StateSpaceAnalysis.jl* handles epoched data by re-using computations across batches, and it includes spline bases for flexible input modeling over the epoch. + + +### High-dimensional Systems -**High-dimensional Systems**: Whole-brain modelling may require a large number of latent factors. +Whole-brain modelling may require a large number of latent factors. *StateSpaceAnalysis.jl* handles scaling through efficient memory allocation, robust covariance formats (via [*PDMats.jl*](https://github.com/JuliaStats/PDMats.jl)), and regularization. + -**Data-driven Initialization**: We need good initialization for systems where we don't have great domain knowledge (especially when there are many latent factors!). +### Data-driven Initialization -*StateSpaceAnalysis.jl* handles parameter initialization through subspace identification methods from [*ControlSystemsIdentification.jl*](https://github.com/baggepinnen/ControlSystemIdentification.jl). +We need good initialization for systems where we don't have great domain knowledge (especially when there are many latent factors!). +*StateSpaceAnalysis.jl* handles parameter initialization through subspace identification methods from [*ControlSystemsIdentification.jl*](https://github.com/baggepinnen/ControlSystemIdentification.jl). + ## Installation @@ -53,6 +61,7 @@ This will install all the necessary dependencies and set up the StateSpaceAnalys ## Walkthrough of the `example/fit_example.jl` script ### Set up `S`, the core structure which carries the parameters and data structure + ```julia S = core_struct( prm=param_struct( @@ -75,6 +84,7 @@ S = core_struct( This structure is used throughout the script, which allows for effective memory management (i.e., the complier can know the size of the data tensors). ### Preprocess the data: + ```julia @reset S = StateSpaceAnalysis.preprocess_fit(S); ``` @@ -104,6 +114,7 @@ StateSpaceAnalysis.null_loglik!(S); ``` ### Warm-start the EM with initial parameters from Subspace Identification (SSID): + ```julia if S.prm.ssid_fit == "fit" # if fitting the SSID @reset S = StateSpaceAnalysis.launch_SSID(S); @@ -113,6 +124,7 @@ end ``` ### Fit the parameters use EM: + ```julia @reset S = StateSpaceAnalysis.launch_EM(S); ``` @@ -134,6 +146,7 @@ end ``` ### Save the fit: + ```julia StateSpaceAnalysis.save_results(S) ``` @@ -144,6 +157,7 @@ StateSpaceAnalysis.save_results(S) ## Functions Overview ### `setup/custom.jl` + **This needs to be set by the user for the project-specific parameters** - `assign_arguments`: Assigns command-line arguments to the structure. - `select_trials`: Selects trials based on custom criteria. @@ -154,6 +168,7 @@ StateSpaceAnalysis.save_results(S) - `format_B_postSSID`: Assigns the estimated B columns to the rest of the matrix. ### `fit/launch.jl` + - `preprocess_fit`: Preprocesses the data and sets up the fitting environment. - `launch_SSID`: Launches the SSID fitting process. - `launch_EM`: Launches the EM fitting process. @@ -162,11 +177,13 @@ StateSpaceAnalysis.save_results(S) - `save_results`: Saves the fitting results. ### `fit/SSID.jl` + **These function are modifed from [*ControlSystemsIdentification.jl*](https://github.com/baggepinnen/ControlSystemIdentification.jl)** - `fit_SSID`: Performs subspace identification for state space analysis. - `subspaceid_SSA`: modified ControlSystemsIdentification.jl for SSID ### `fit/EM.jl` + - `fit_EM`: Runs the EM algorithm for individual participants. - `ESTEP!`: Executes the E-step of the EM algorithm. - `MSTEP`: Executes the M-step of the EM algorithm. @@ -176,19 +193,27 @@ StateSpaceAnalysis.save_results(S) ### `fit/posteriors.jl` + - `posterior_all`: Generates all posterior estimates (mean and covariance). - `posterior_mean`: Generates only the posterior means. - `posterior_sse`: Computes the sum of squared errors for the posteriors. ### `setup/setup.jl` + - `read_args`: process command line arguements (for running on the cluster) - `setup_path`: Sets up the directory paths for saving results. - `load_data`: Loads the data from files. - `build_inputs`: Builds the input matrices for the model. - `whiten`: Whitens the observations (PCA). +### `setup/generate.jl` + +- `gen_rand_params`: generate random SSM parameters +- `generate_ssm_trials`: simulate trials from a set of SSM parameters + ### `setup/structs.jl` + - `param_struct`: Defines the parameters structure. - `data_struct`: Defines the data structure. - `results_struct`: Defines the results structure. @@ -200,6 +225,7 @@ StateSpaceAnalysis.save_results(S) - `post_sse`: Defines the structure for posterior sum of squared errors. ### `utils/utils.jl` + - `tol_PD`: Ensures a matrix is positive definite with a tolerance. - `tol_PSD`: Ensures a matrix is positive semi-definite with a tolerance. - `demix`: Demixes the observations using the saved PCA transformation. diff --git a/src/StateSpaceAnalysis.jl b/src/StateSpaceAnalysis.jl index ec66584..365cca4 100644 --- a/src/StateSpaceAnalysis.jl +++ b/src/StateSpaceAnalysis.jl @@ -63,7 +63,7 @@ export core_struct, param_struct, data_struct, results_struct, estimates_struct set_model, transform_model include("setup/generate.jl") -export gen_rand_params, generate_ssm_trials +export generate_rand_params, generate_ssm_trials # utility functions diff --git a/src/fit/wrapper.jl b/src/fit/wrapper.jl index 5ca2709..3b8ffcc 100644 --- a/src/fit/wrapper.jl +++ b/src/fit/wrapper.jl @@ -36,7 +36,7 @@ function preprocess_fit(S) @reset S.est = deepcopy(set_estimates(S)); # init model - @reset S = deepcopy(gen_rand_params(S)); + @reset S = deepcopy(generate_rand_params(S)); # ======================================================================= diff --git a/src/setup/generate.jl b/src/setup/generate.jl index d853e72..b51c80a 100644 --- a/src/setup/generate.jl +++ b/src/setup/generate.jl @@ -2,7 +2,7 @@ # init random parameters -function gen_rand_params(S) +function generate_rand_params(S) A = Matrix(Diagonal(rand(S.dat.x_dim))); B = randn(S.dat.x_dim, S.dat.u_dim); From 19021ffceca47d7cdb2b1fc62382ae800c6176ff Mon Sep 17 00:00:00 2001 From: Harrison Ritz <harrison.ritz@gmail.com> Date: Wed, 4 Dec 2024 17:48:54 -0500 Subject: [PATCH 07/11] more spacing --- README.md | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index 271b933..a71e0cd 100644 --- a/README.md +++ b/README.md @@ -12,27 +12,25 @@ This package provides tools for preprocessing data, fitting models, and evaluati ### Event-related designs -Neuroimaging data often has epoched/batched sequences (e.g., states x timesteps x trials). +Neuroimaging data often has epoched/batched sequences (e.g., states x timesteps x trials).*StateSpaceAnalysis.jl* handles epoched data by re-using computations across batches, and it includes spline bases for flexible input modeling over the epoch. -*StateSpaceAnalysis.jl* handles epoched data by re-using computations across batches, and it includes spline bases for flexible input modeling over the epoch. ### High-dimensional Systems -Whole-brain modelling may require a large number of latent factors. +Whole-brain modelling may require a large number of latent factors. *StateSpaceAnalysis.jl* handles scaling through efficient memory allocation, robust covariance formats (via [*PDMats.jl*](https://github.com/JuliaStats/PDMats.jl)), and regularization. -*StateSpaceAnalysis.jl* handles scaling through efficient memory allocation, robust covariance formats (via [*PDMats.jl*](https://github.com/JuliaStats/PDMats.jl)), and regularization. ### Data-driven Initialization -We need good initialization for systems where we don't have great domain knowledge (especially when there are many latent factors!). +We need good initialization for systems where we don't have great domain knowledge (especially when there are many latent factors! *StateSpaceAnalysis.jl* handles parameter initialization through subspace identification methods from [*ControlSystemsIdentification.jl*](https://github.com/baggepinnen/ControlSystemIdentification.jl). -*StateSpaceAnalysis.jl* handles parameter initialization through subspace identification methods from [*ControlSystemsIdentification.jl*](https://github.com/baggepinnen/ControlSystemIdentification.jl). + ## Installation To install the *StateSpaceAnalysis.jl* package, follow these steps: @@ -99,10 +97,10 @@ StateSpaceAnalysis.setup_path(S) # load and format the data; split for cross-validation S = deepcopy(StateSpaceAnalysis.load_data(S)); -# build the input matrices +# build the input tenors (e.g., z-score and convolve with basis) S = deepcopy(StateSpaceAnalysis.build_inputs(S)); -# transform the observed data +# transform the observed data (PCA) S = deepcopy(StateSpaceAnalysis.whiten(S)); # fit baseline models to the data @@ -110,7 +108,7 @@ StateSpaceAnalysis.null_loglik!(S); # initialize the expectations and parameters @reset S.est = deepcopy(set_estimates(S)); -@reset S = deepcopy(gen_rand_params(S)); +@reset S = deepcopy(generate_rand_params(S)); ``` ### Warm-start the EM with initial parameters from Subspace Identification (SSID): From 04cbe61cc2d053a55a4d6e2533ecf7f7e116dfd3 Mon Sep 17 00:00:00 2001 From: Harrison Ritz <harrison.ritz@gmail.com> Date: Wed, 4 Dec 2024 17:49:55 -0500 Subject: [PATCH 08/11] ok now less spacing --- README.md | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index a71e0cd..d03f1c0 100644 --- a/README.md +++ b/README.md @@ -12,24 +12,18 @@ This package provides tools for preprocessing data, fitting models, and evaluati ### Event-related designs -Neuroimaging data often has epoched/batched sequences (e.g., states x timesteps x trials).*StateSpaceAnalysis.jl* handles epoched data by re-using computations across batches, and it includes spline bases for flexible input modeling over the epoch. - - +Neuroimaging data often has epoched/batched sequences (e.g., states x timesteps x trials). *StateSpaceAnalysis.jl* handles epoched data by re-using computations across batches, and it includes spline bases for flexible input modeling over the epoch. ### High-dimensional Systems -Whole-brain modelling may require a large number of latent factors. *StateSpaceAnalysis.jl* handles scaling through efficient memory allocation, robust covariance formats (via [*PDMats.jl*](https://github.com/JuliaStats/PDMats.jl)), and regularization. - - +Whole-brain modelling may require a large number of latent factors. *StateSpaceAnalysis.jl* handles scaling through efficient memory allocation, robust covariance formats (via [*PDMats.jl*](https://github.com/JuliaStats/PDMats.jl)), and regularization. ### Data-driven Initialization We need good initialization for systems where we don't have great domain knowledge (especially when there are many latent factors! *StateSpaceAnalysis.jl* handles parameter initialization through subspace identification methods from [*ControlSystemsIdentification.jl*](https://github.com/baggepinnen/ControlSystemIdentification.jl). - - ## Installation From 49acfc1eabcc1493dd0df97a82fa292ad069fed6 Mon Sep 17 00:00:00 2001 From: Harrison Ritz <harrison.ritz@gmail.com> Date: Wed, 4 Dec 2024 17:51:07 -0500 Subject: [PATCH 09/11] typo --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index d03f1c0..0b96305 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ This package provides tools for preprocessing data, fitting models, and evaluati ### Event-related designs -Neuroimaging data often has epoched/batched sequences (e.g., states x timesteps x trials). *StateSpaceAnalysis.jl* handles epoched data by re-using computations across batches, and it includes spline bases for flexible input modeling over the epoch. +Neuroimaging data often has epoched/batched sequences (e.g., states x timesteps x trials). *StateSpaceAnalysis.jl* handles epoched data by re-using computations across batches, and it includes spline temporal bases for flexible input modeling over the epoch. ### High-dimensional Systems @@ -22,7 +22,7 @@ Whole-brain modelling may require a large number of latent factors. *StateSpaceA ### Data-driven Initialization -We need good initialization for systems where we don't have great domain knowledge (especially when there are many latent factors! *StateSpaceAnalysis.jl* handles parameter initialization through subspace identification methods from [*ControlSystemsIdentification.jl*](https://github.com/baggepinnen/ControlSystemIdentification.jl). +We need good initialization for systems where we don't have great domain knowledge (especially when there are many latent factors). *StateSpaceAnalysis.jl* handles parameter initialization through subspace identification methods from [*ControlSystemsIdentification.jl*](https://github.com/baggepinnen/ControlSystemIdentification.jl). ## Installation From c9b9f3bed78973901996c4d31bea648d512d0c09 Mon Sep 17 00:00:00 2001 From: Harrison Ritz <harrison.ritz@gmail.com> Date: Wed, 4 Dec 2024 17:51:44 -0500 Subject: [PATCH 10/11] typo --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 0b96305..fbdce62 100644 --- a/README.md +++ b/README.md @@ -22,7 +22,7 @@ Whole-brain modelling may require a large number of latent factors. *StateSpaceA ### Data-driven Initialization -We need good initialization for systems where we don't have great domain knowledge (especially when there are many latent factors). *StateSpaceAnalysis.jl* handles parameter initialization through subspace identification methods from [*ControlSystemsIdentification.jl*](https://github.com/baggepinnen/ControlSystemIdentification.jl). +We need good initialization for systems for which we don't have great domain knowledge (especially when there are many latent factors). *StateSpaceAnalysis.jl* handles parameter initialization through subspace identification methods from [*ControlSystemsIdentification.jl*](https://github.com/baggepinnen/ControlSystemIdentification.jl). ## Installation From f7251bdab9246db75c7de47a5d86fabb3c51672f Mon Sep 17 00:00:00 2001 From: Harrison Ritz <harrison.ritz@gmail.com> Date: Wed, 4 Dec 2024 17:52:28 -0500 Subject: [PATCH 11/11] using --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index fbdce62..7e6c3fc 100644 --- a/README.md +++ b/README.md @@ -45,6 +45,7 @@ To install the *StateSpaceAnalysis.jl* package, follow these steps: 3. **Add the package to your Julia environment:** ```julia Pkg.add(path=".") + using StateSpaceAnalysis ``` This will install all the necessary dependencies and set up the StateSpaceAnalysis.jl package for use.