-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of missing data in Estimators #42
Comments
I think the current observed behavior is actually what we want. In principle the This does mean that users should never pass in values of I agree with your suggestion to raise an exception when v=0 and a warning when y=0. As far as ignoring studies when NaNs are passed, I have mixed feelings. I realize it would make your life easier in this particular case, because it would be nice to always pass in the same voxels and not have to worry about missing values. But I'm a bit apprehensive about making this the default behavior, as it would make it easy for users to overlook NaNs and conclude that their meta-analytic estimates are based on however many data points they passed in (including NaNs). I think a good compromise is to add an initialization flag to all estimators that specifies how NaNs are handled, where the default is |
I think that's reasonable.
That sounds good to me! In addition to the initialization flag, do you think it would useful to include |
Yes, degrees of freedom should definitely be somewhere prominent in the summary. I'm a bit apprehensive about DoF maps in NiMARE. Or rather, I definitely think there should be a DoF map, but I'm wary about allowing it to have non-constant values across voxels—i.e., having it show how many studies contributed to the estimate at each voxel. The concern is that users won't have an easy way to tell which studies contributed to which voxels, so different voxels won't be exactly commensurable even if they happen to have the same DoF. Also, I'm guessing most users won't ever actually look at the DoF, and hence won't realize that the estimates in their map don't all come from the same underlying sample. My fairly strong inclination would be to default to either only including voxels that have no missing data at all, or generating separate maps for every combination of studies (which admittedly would be a huge PITA, and clunky from a UX perspective too). At the very least I think users should explicitly have to take responsibility for allowing variable studies across voxels by setting some aptly named argument, e.g., Tagging @nicholst here in case he has thoughts on this issue (i.e., of whether/how to handle/display voxels in the same map where estimates reflect different sets of studies). |
I concur with your concerns, but I think as soon as anyone starts doing IMBA's with any heterogeneity we're going to get into the variable n (df) issue. I don't see it as being at the top of the feature to-do list, but I'm pretty adamant that we should engineer NiMARE to not rule out or make it impossible to have varying n per voxel. I leave it to you two on the Engineering details and where, in the grand scheme, the feature should be implemented. So, I like some awkward flag that they have to set... that works for me. I agree, once people routinely use |
That sounds reasonable to me. The current implementation already allows varying n per voxels, so that's not a problem. It's just a question of how to present that to the user, and what they have to do to enable it. I think having a fairly angry argument name should be sufficient to get people to at least consult the docstring the first time. (Though I agree with you that over time it's likely to become a stylized fact in the community that one should always enable this setting.) |
Per @tyarkoni in #40 (comment):
Using DerSimonianLaird as an example estimator, here is the behavior I've found:
y
--> tau2 = NaNy
--> tau2 = 0v
--> tau2 = NaNv
--> tau2 = NaNI think the behavior we want is to ignore any studies with NaNs in either
y
orv
, and maybe raise an error whenv
is 0 (with the recommendation to fill missing data with NaNs instead of zeros) and raise a warning wheny
is 0. And of course, if all studies have NaN, then all parameters estimated by the estimator should be NaN.It would then be the user's responsibility to ensure that missing data is represented with NaNs. On NiMARE's side, I can open an issue to do this automatically in
MetaEstimator.fit()
. I don't think masking is feasible for NiMARE, given that missing data may vary by both study and voxel, so we'd end up with varying numbers of studies contributing to each voxel's meta-analysis.WDYT?
The text was updated successfully, but these errors were encountered: