-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not fully crossed study - BDG warning #185
Comments
I think I understand your application and question. It looks like you are using the java gui. This is a static piece of software that is no longer being maintained. I recommend that you use the R package moving forward. You can find information here: iMRMC: Software to do Multi-reader Multi-case Statistical Analysis of Reader Studies | Center for Devices and Radiological Health (fda.gov) The warning about the degrees of freedom (DF) is not a problem. If the DF estimates fall below the lower bound, they are set to the lower bound. The DF estimates have uncertainty in them, especially in cases where data is limited. In your case, the number of exams is 15+15+15+15 (2 case sets x 2 truths), which is small for ROC analysis. The lower bound of 29 (=30-1), comes from the number of signal-present cases for sensitivity, signal-absent cases for specificity, or the minimum of these for ROC. It’s a bit of a waste of effort to have 37 readers evaluate the same small number of cases. Please see this paper on split-plot studies:
There is certainly something funny about the specificity results. The DF_BDG for specificity is calculated as 0.93!!! That is not good … red flag. I wouldn't use any p-values from the software. Notice the DF_BDG is ~24 for AUC. That is healthy. My guess is that many readers are making the exact same interpretations on the signal-absent cases … little to no reader variability. I’m curious to know if this is true. Your issue is causing me to think I should return a different error if DF_BDG is below 3 or even 5. Without any more of the output or input data, it is hard to give more of a response. p-values are only one kind of output; they can be misinterpreted or completely inappropriate. Point estimates and confidence intervals tell a much more complete story. I don’t have a solution for you except to refer to the per-reader results . Finally, I would avoid using the MLE results. They are not validated when the study design is not fully crossed, and I’ve observed weird results in such cases. Your data is not fully crossed. Your question is nudging me to remove the MLE results completely from the current software. |
Thank you so much for your reply Brandon Gallas. Instead of using the Java GUI, we moved forward with the R package, and our results remain similar. It is good to know that the warning about the degrees of freedom is not a problem, as there is uncertainty in the estimates. Given the extremely low DF_BDG for specificity, we checked whether readers are making more of the exact same binary interpretations in the signal-absent cases compared to the signal-present cases. While there is notable reader agreement, this trend is present for both signal-absent and signal-present cases. For testing purposes, we slightly modified our data to increase agreement, resulting in a marginally higher degrees of freedom (1.05) but triggering a negative estimate warning. Conversely, we also adjusted the data to reduce agreement, which led to a higher degrees of freedom of 10.38, though still relatively low. To clarify, the two datasets consist of independent cases, with a total of 60 unique exams, each assigned a distinct case ID. The study design ensures that the same readers do not review the exact same exam during a single reading session, while still allowing them to participate in both study modalities. The readers were distributed across the following combinations: We are now considering reporting only point estimates and confidence intervals, omitting p-values. Thanks, |
Good luck. BTW, 10 degrees of freedom is much better than 1. Think about it like this. Would you be more confident in a variance estimate from 10 independent observations or just one observation? Thanks for the feedback. |
We conducted a reader study with 2 different reading conditions using two datasets, each containing 30 exams with a 1:1 ratio of malignant to normal cases. Each of the 37 readers participated in a single reading session, reviewing both datasets: one with condition 1 and the other with condition 2. Due to logistical constraints, our study design is not fully crossed. We know that we pay a statistical price for this, but hope that using 37 readers mitigates this.
We conducted iMRMC analyses using the Java iMRMC software for AUC, sensitivity, and specificity, but encountered warnings with the BDG method stating that the DF_BDG is below a minimum and has been set to 29.0.
e.g. for AUC:
e.g. for specificity:
This warning does not appear when we use the MLE analysis. We observed that the p-values of the DBG and MLE estimate differ, specifically specificity, which turned out to be significantly different for the 2 conditions when using BDG (p=0.0003 with warning) but not when using MLE (p=0.204).
We are uncertain which method would be more appropriate for our study. I understand MLE can avoid a total negative variance estimate. However, the total variance estimate with the BDG method does not seem to be negative. I would greatly appreciate your guidance on the best approach for our context.
The text was updated successfully, but these errors were encountered: