Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding robustica option to ICA decomposition to achieve consistent results #1013

Merged
merged 47 commits into from
Sep 23, 2024
Merged
Show file tree
Hide file tree
Changes from 42 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
f4eaa3e
Add robustica method
BahmanTahayori Jul 31, 2023
b0cac3a
Incorporation of major comments regarding robustica addition
BahmanTahayori Dec 5, 2023
55c2ae4
Add robustica 0.1.3 to dependency list
BahmanTahayori Nov 1, 2023
cd55a3f
Multiple fixes to RobustICA addition from code review
BahmanTahayori Dec 5, 2023
2d9b007
Specify magic number fixed seed of 42 as a constant
BahmanTahayori Nov 29, 2023
09e565e
Merge remote-tracking branch 'upstream/main' into add_robustica_rsclean
BahmanTahayori Dec 5, 2023
fc5f9ea
Updated
BahmanTahayori Dec 5, 2023
4fc3043
Robustica Updates
BahmanTahayori Dec 6, 2023
a20ff57
Incorporating the third round of Robert E. Smith's comments
BahmanTahayori Dec 20, 2023
cc5e05d
Merge pull request #3 from BahmanTahayori/add_robustica_rsclean
BahmanTahayori Dec 20, 2023
a449fec
Merge branch 'ME-ICA:main' into main
BahmanTahayori Feb 9, 2024
78c8140
Enhance the "ica_method" description suggested by D. Handwerker
BahmanTahayori Feb 9, 2024
ac85e6a
Enhancing the "n_robust_runs" description suggested by D. Handwerkerd
BahmanTahayori Feb 9, 2024
979d026
RobustICA: Restructure code loop over robust methods (#4)
Lestropie Feb 11, 2024
71d8d4a
merging recent changes
BahmanTahayori Feb 21, 2024
cac38cd
Applied suggested changes
BahmanTahayori Feb 29, 2024
5fcf148
Fixing the conflict
BahmanTahayori Feb 29, 2024
b7d08e9
Merge branch 'ME-ICA:main' into main
BahmanTahayori Feb 29, 2024
a113423
Incorporating more comments
BahmanTahayori Mar 4, 2024
b60e9a6
Merge remote-tracking branch 'upstream/main'
Lestropie Apr 12, 2024
45c95ce
aligning adding-robustica with Main
handwerkerd Aug 13, 2024
8e6878f
Adding already requested changes
handwerkerd Aug 13, 2024
88fd148
fixed failing tests
handwerkerd Aug 16, 2024
a221e72
updated documentation in faq.rst
handwerkerd Aug 27, 2024
8622a9b
more documentation changes
handwerkerd Aug 29, 2024
419e9d4
Update docs/faq.rst
handwerkerd Aug 30, 2024
d29a91b
Update docs/faq.rst
handwerkerd Aug 30, 2024
efb712e
Aligning robustICA with current Main + (#5)
handwerkerd Aug 30, 2024
dee68ec
align with main
handwerkerd Sep 3, 2024
171a835
Merge branch 'ME-ICA:main' into adding-robustica
handwerkerd Sep 3, 2024
4b86fe2
fixed ica.py docstring error
handwerkerd Sep 3, 2024
c01ec51
added scikit-learn-extra to pyproject and changed ref name
handwerkerd Sep 3, 2024
cd50037
increment circleci version keys
handwerkerd Sep 3, 2024
54a4ac3
Merge branch 'main' into adding-robustica
BahmanTahayori Sep 10, 2024
9bb021a
Merge pull request #6 from handwerkerd/adding-robustica
BahmanTahayori Sep 10, 2024
cd29060
Merge branch 'ME-ICA:main' into main
BahmanTahayori Sep 10, 2024
2a868e3
Removing the scikit-learn-extra dependency
BahmanTahayori Sep 11, 2024
ed10ade
Updating pyproject.toml file
BahmanTahayori Sep 11, 2024
8a14277
Minor changes to make the help more readable
BahmanTahayori Sep 12, 2024
b793d83
Minor changes
BahmanTahayori Sep 12, 2024
1b1eb38
upgrading to robustica 0.1.4
BahmanTahayori Sep 17, 2024
87965f4
Update docs
BahmanTahayori Sep 17, 2024
8743aca
updating utils.py, toml file and the docs
BahmanTahayori Sep 18, 2024
42372e3
minor change to utils.py
BahmanTahayori Sep 18, 2024
3d88eb4
Merge branch 'main' into main
handwerkerd Sep 18, 2024
1aac94a
Incorporating Eneko's comments
BahmanTahayori Sep 20, 2024
2b0ef67
Added a warning when the clustering method is changed
BahmanTahayori Sep 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 18 additions & 18 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
steps:
- checkout
- restore_cache:
key: conda-py38-v2-{{ checksum "pyproject.toml" }}
key: conda-py38-v3-{{ checksum "pyproject.toml" }}
- run:
name: Generate environment
command: |
Expand All @@ -23,7 +23,7 @@ jobs:
pip install -e .[tests]
fi
- save_cache:
key: conda-py38-v2-{{ checksum "pyproject.toml" }}
key: conda-py38-v3-{{ checksum "pyproject.toml" }}
paths:
- /opt/conda/envs/tedana_py38

Expand All @@ -34,7 +34,7 @@ jobs:
steps:
- checkout
- restore_cache:
key: conda-py38-v2-{{ checksum "pyproject.toml" }}
key: conda-py38-v3-{{ checksum "pyproject.toml" }}
- run:
name: Running unit tests
command: |
Expand All @@ -56,7 +56,7 @@ jobs:
steps:
- checkout
- restore_cache:
key: conda-py39-v2-{{ checksum "pyproject.toml" }}
key: conda-py39-v3-{{ checksum "pyproject.toml" }}
- run:
name: Generate environment
command: |
Expand All @@ -75,7 +75,7 @@ jobs:
mkdir /tmp/src/coverage
mv /tmp/src/tedana/.coverage /tmp/src/coverage/.coverage.py39
- save_cache:
key: conda-py39-v2-{{ checksum "pyproject.toml" }}
key: conda-py39-v3-{{ checksum "pyproject.toml" }}
paths:
- /opt/conda/envs/tedana_py39
- persist_to_workspace:
Expand All @@ -90,7 +90,7 @@ jobs:
steps:
- checkout
- restore_cache:
key: conda-py310-v1-{{ checksum "pyproject.toml" }}
key: conda-py310-v3-{{ checksum "pyproject.toml" }}
- run:
name: Generate environment
command: |
Expand All @@ -109,7 +109,7 @@ jobs:
mkdir /tmp/src/coverage
mv /tmp/src/tedana/.coverage /tmp/src/coverage/.coverage.py310
- save_cache:
key: conda-py310-v1-{{ checksum "pyproject.toml" }}
key: conda-py310-v3-{{ checksum "pyproject.toml" }}
paths:
- /opt/conda/envs/tedana_py310
- persist_to_workspace:
Expand All @@ -124,7 +124,7 @@ jobs:
steps:
- checkout
- restore_cache:
key: conda-py311-v1-{{ checksum "pyproject.toml" }}
key: conda-py311-v3-{{ checksum "pyproject.toml" }}
- run:
name: Generate environment
command: |
Expand All @@ -143,7 +143,7 @@ jobs:
mkdir /tmp/src/coverage
mv /tmp/src/tedana/.coverage /tmp/src/coverage/.coverage.py311
- save_cache:
key: conda-py311-v1-{{ checksum "pyproject.toml" }}
key: conda-py311-v3-{{ checksum "pyproject.toml" }}
paths:
- /opt/conda/envs/tedana_py311
- persist_to_workspace:
Expand All @@ -158,7 +158,7 @@ jobs:
steps:
- checkout
- restore_cache:
key: conda-py312-v1-{{ checksum "pyproject.toml" }}
key: conda-py312-v3-{{ checksum "pyproject.toml" }}
- run:
name: Generate environment
command: |
Expand All @@ -177,7 +177,7 @@ jobs:
mkdir /tmp/src/coverage
mv /tmp/src/tedana/.coverage /tmp/src/coverage/.coverage.py312
- save_cache:
key: conda-py312-v1-{{ checksum "pyproject.toml" }}
key: conda-py312-v3-{{ checksum "pyproject.toml" }}
paths:
- /opt/conda/envs/tedana_py312
- persist_to_workspace:
Expand All @@ -192,7 +192,7 @@ jobs:
steps:
- checkout
- restore_cache:
key: conda-py38-v2-{{ checksum "pyproject.toml" }}
key: conda-py38-v3-{{ checksum "pyproject.toml" }}
- run:
name: Style check
command: |
Expand All @@ -208,7 +208,7 @@ jobs:
steps:
- checkout
- restore_cache:
key: conda-py38-v2-{{ checksum "pyproject.toml" }}
key: conda-py38-v3-{{ checksum "pyproject.toml" }}
- run:
name: Run integration tests
no_output_timeout: 40m
Expand All @@ -233,7 +233,7 @@ jobs:
steps:
- checkout
- restore_cache:
key: conda-py38-v2-{{ checksum "pyproject.toml" }}
key: conda-py38-v3-{{ checksum "pyproject.toml" }}
- run:
name: Run integration tests
no_output_timeout: 40m
Expand All @@ -258,7 +258,7 @@ jobs:
steps:
- checkout
- restore_cache:
key: conda-py38-v2-{{ checksum "pyproject.toml" }}
key: conda-py38-v3-{{ checksum "pyproject.toml" }}
- run:
name: Run integration tests
no_output_timeout: 40m
Expand All @@ -283,7 +283,7 @@ jobs:
steps:
- checkout
- restore_cache:
key: conda-py38-v2-{{ checksum "pyproject.toml" }}
key: conda-py38-v3-{{ checksum "pyproject.toml" }}
- run:
name: Run integration tests
no_output_timeout: 40m
Expand All @@ -308,7 +308,7 @@ jobs:
steps:
- checkout
- restore_cache:
key: conda-py38-v2-{{ checksum "pyproject.toml" }}
key: conda-py38-v3-{{ checksum "pyproject.toml" }}
- run:
name: Run integration tests
no_output_timeout: 40m
Expand All @@ -335,7 +335,7 @@ jobs:
at: /tmp
- checkout
- restore_cache:
key: conda-py38-v2-{{ checksum "pyproject.toml" }}
key: conda-py38-v3-{{ checksum "pyproject.toml" }}
- run:
name: Merge coverage files
command: |
Expand Down
3 changes: 3 additions & 0 deletions docs/approach.rst
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,8 @@ Next, ``tedana`` applies TE-dependent independent component analysis (ICA) in
order to identify and remove TE-independent (i.e., non-BOLD noise) components.
The dimensionally reduced optimally combined data are first subjected to ICA in
order to fit a mixing matrix to the whitened data.
``tedana`` can use a single interation of FastICA or multiple interations of robustICA,
with an explanation of those approaches `in our FAQ`_.
This generates a number of independent timeseries (saved as **desc-ICA_mixing.tsv**),
as well as parameter estimate maps which show the spatial loading of these components on the
brain (**desc-ICA_components.nii.gz**).
Expand Down Expand Up @@ -356,6 +358,7 @@ yielding a denoised timeseries, which is saved as **desc-denoised_bold.nii.gz**.

.. image:: /_static/a15_denoised_data_timeseries.png

.. _in our FAQ: faq.html#tedana-what-is-the-right-number-of-ica-components-what-options-let-me-get-it
.. _These decision trees are detailed here: included_decision_trees.html

*******************************
Expand Down
68 changes: 68 additions & 0 deletions docs/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,11 +93,79 @@ The TEDICA step may fail to converge if TEDPCA is either too strict
With updates to the ``tedana`` code, this issue is now rare, but it may happen
when preprocessing has not been applied to the data, or when improper steps have
been applied to the data (e.g. rescaling, nuisance regression).
It can also still happen when everything is seemingly correct
(see the answer to the next question).
If you are confident that your data have been preprocessed correctly prior to
applying tedana, and you encounter this problem, please submit a question to `NeuroStars`_.

.. _NeuroStars: https://neurostars.org

*********************************************************************************
[tedana] What is the right number of ICA components & what options let me get it?
*********************************************************************************

Part of the PCA step in ``tedana`` processing involves identifying the number of
components that contain meaningful signal.
The PCA components are then used to calculate the same number of ICA components.
The ``--tedpca`` option includes several options to identify the "correct" number
of PCA components.
``kundu`` and ``kundu-stabilize`` use several echo-based criteria to exclude PCA
components that are unlikely to contain T2* or S0 signal.
``mdl`` (conservative & fewest components), ``kic``,
& ``aic`` (liberal & more components) use `MAPCA`_.
Within the same general method, each uses a cost function to find a minimum
where more components no longer model meaningful variance.
For some datasets we see all methods fail and result in too few or too many components.
There is no consistent number of components or % variance explained to define the correct number.
The correct number of components will depend on the noise levels of the data.
For example, smaller voxels will results in more thermal noise and less total variance explained.
A dataset with more head motion artifacts will have more variance explained,
since more structured signal is within the head motion artifacts.
The clear failure cases are extreme. That is getting less than 1/5 the number of components
compared to time points or having nearly has many components as time point.
BahmanTahayori marked this conversation as resolved.
Show resolved Hide resolved
We are working on identifying why this happens and adding get better solutions.
BahmanTahayori marked this conversation as resolved.
Show resolved Hide resolved
Our current guess is that most of the above methods assume data are
independant and identically distributed (IID),
and signal leakage from in-slice and multi-slice accelleration may violate this assumption.

We have one option that is generally useful and is also a partial solution.
``--ica_method robustica`` will run `robustica`_.
This is a method that, for a given number of PCA components,
will repeated run ICA and identify components that are stable across iterations.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will "repeatedly " run

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BahmanTahayori: This particular fix has already been addressed in 8743aca, but for the sake of future interactions, you can propose such fixes on GitHub in a way that maintainers can accept with a single button press: https://haacked.com/archive/2019/06/03/suggested-changes/ (you may have previously interacted with this mechanism when I was proposing modifications on your fork).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Lestropie. Will consider it for the next time.

While running ICA multiple times will slow processing, as a general benefit,
this means that the ICA results are less sensitive to the initialization parameters,
computer hardware, and software versions.
This will result in better stability and replicability of ICA results.
Additionally, `robustica`_ almost always results in fewer components than initially prescripted,
since there are fewer stable components across interations than the total number of components.
This means, even if the initial PCA component estimate is a bit off,
the number of resulting robust ICA components will represent stable information in the data.
For a dataset where the PCA comoponent estimation methods are failing,
one could use ``--tedpca`` with a fixed integer for a constant number of components,
that is on the high end of typical for a study,
BahmanTahayori marked this conversation as resolved.
Show resolved Hide resolved
and then `robustica`_ will reduce the number of components to only find stable information.
That said, if the fixed PCA component number is too high,
then the method will have too many unstable components,
and if the fixed PCA component number is too low, then there will be even fewer ICA components.
The number of ICA components is more consisent,
BahmanTahayori marked this conversation as resolved.
Show resolved Hide resolved
but is still sensitive to the intial number of PCA components.
For example, for a single dataset 60 PCA components might result in 46 stable ICA components,
while 55 PCA components might results in 43 stable ICA components.
We are still testing how these interact and give better recommendations and even more stable results.
BahmanTahayori marked this conversation as resolved.
Show resolved Hide resolved
At that point, ``--ica_method robustica`` might become the default setting,
BahmanTahayori marked this conversation as resolved.
Show resolved Hide resolved
While the TEDANA developers expect that ``--ica_method robustica`` may become
the default configuration in future TEDANA versions,
it is first being released to the public as a non-default option
in hope of gaining insight into its behaviour
across a broader range of multi-echo fMRI data.
If users are having trouble with PCA component estimation failing on a dataset,
we recommend using RobustICA;
we also invite user feedback on its behaviour and efficacy.
BahmanTahayori marked this conversation as resolved.
Show resolved Hide resolved


.. _MAPCA: https://github.com/ME-ICA/mapca
.. _robustica: https://github.com/CRG-CNAG/robustica

.. _manual classification:

********************************************************************************
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ dependencies = [
"pandas>=2.0,<=2.2.2",
"pybtex",
"pybtex-apa-style",
"robustica>=0.1.4",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsalo I'm still a bit confused about the dependabot. How do we set this up so that it requires a minimum bersion of 0.1.4, but will also automatically open a pull request when new versions of robustica are released?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something like "robustica>=0.1.4,<=0.1.4" should work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @handwerkerd and @tsalo for your comments. I applied all the required changes. Regarding the clustering results, the code is ready and I generate a figure to visualise the clustering in 2D (going from high dimension to low dimension) using TSNE, https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html. I can show the results with or without the noise cluster. I have attached an example for these two cases (the red dots are clustered as noise). In my local code, the result is saved in the "figure" directory but I have not added it to the HTML report. What are your thoughts on adding them to the HTML report? Which figure should be added?

I think now that this PR is ready, it is better to finalise it and then we will add these results and the iq variable in a separate PR.

Projection_Clusters

Projection_Clusters_noise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me see if I correctly understand these figure: Each small circle is a component from one iteration of fastICA. The black circle are the components that were deemed part of stable clusters and the red circles are the components that didn't end up in any cluster. The blue lines connect the components in a cluster. I'm assuming the final stable component is some type of centroid for each cluster? As an evaluation tool, the ideal would be every combined is in a small, tight circle, but we can see some are in longer lines and some are a bit farther apart. With the plot with the red dots, in the top center, I see a grouping of 3 clusters with red in-between which would tell me that the algorithm picked 3 stable components from this cluster, but, with a few more/different iterations, the expact placement of those stable components in this space might have been slightly different.

Is this description correct?

I do think this is useful and could be added to the html report, but let's definitely do that as a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, your description is correct. I used a convex hull to draw the blue line around each cluster. The centroid is calculated in robustica (_compute_centroids). Your explanation about the 3 clusters at the top centre is excellent. With a few more or less iterations those three could be combined into 1 cluster.

I will do it in a separate PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be a different PR. We want to merge this and have RobustICA available for users as soon as possible, and I see the figure as an extra really but not part of the core functionality.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @BahmanTahayori, could you please share the code you used to generate these figures? Or direct me to a branch or PR where you have this?

I cannot find the separate PR you mentioned in your comment. Thanks!

Copy link
Collaborator

@eurunuela eurunuela Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think I found it looking into your GitHub profile.

It is this function here, right? 👇

https://github.com/BahmanTahayori/tedana_florey/blob/70a5d1a5d9730bd99ad98bf711d7f3de00d990cf/tedana/reporting/static_figures.py#L805

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a PR on @BahmanTahayori's fork; we have used fork PRs for internal assistance / revision / discussion prior to proposals to ME-ICA.
BahmanTahayori#7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the function that has been added, @eurunuela and the figure is saved as a PNG file in the \figure folder. However, as @Lestropie mentioned, we go through an internal review first. I have an updated version of the code, but it has not yet been finalised. Will work on it and keep you posted.

"scikit-learn>=0.21, <=1.5.1",
"scipy>=1.2.0, <=1.14.1",
"threadpoolctl",
Expand Down
19 changes: 19 additions & 0 deletions tedana/config.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to open an issue so we can discuss where we move these defaults.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #1132.

Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
"""Setting default values for ICA decomposition."""
handwerkerd marked this conversation as resolved.
Show resolved Hide resolved

DEFAULT_ICA_METHOD = "fastica"
DEFAULT_N_MAX_ITER = 500
DEFAULT_N_MAX_RESTART = 10
DEFAULT_SEED = 42


"""Setting values for number of robust runs."""

DEFAULT_N_ROBUST_RUNS = 30
MIN_N_ROBUST_RUNS = 5
MAX_N_ROBUST_RUNS = 500
WARN_N_ROBUST_RUNS = 200


"""Setting the warning threshold for the index quality."""

WARN_IQ = 0.6
Loading