Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bootVcov1 gives an error due to differences in the dimension of the bootstraps #131

Open
jaquol opened this issue Apr 22, 2020 · 6 comments

Comments

@jaquol
Copy link

jaquol commented Apr 22, 2020

I am trying to run competitive gene set enrichment analysis (GSEA) in MAST (hereafter referred to as "MAST cGSEA") as illustrated here.

I have several SingleCellAssay (sca) objects and a few designs (passed through the "formula" parameter). All the designs are of this kind:

zlmCond <- zlm(~ y + z, sca)

Where:

  • y = factor with two possible values (e.g. responders and non-responders)
  • z = factor with multiple possible values; specifically, patient ID (as I am aiming to perform the analysis accounting for patient ID)

The MAST cGSEA workflow works fine for most of the sca objects and designs. However, I noticed it fails in some cases, specifically at the step of creating the bootstraps with MAST's bootVcov1 function. Below is the code for such function:

bootVcov1 <- function (zlmfit, R = 99) {
    
    sca <- zlmfit@sca
    N <- ncol(sca)
    LMlike <- zlmfit@LMlike
    manyvc <- raply(R, {
        s <- sample(N, replace = TRUE)
        newsca <- sca[, s]
        LMlike <- update(LMlike, design = colData(newsca))
        zlm(sca = newsca, LMlike = LMlike, onlyCoef = TRUE)
    })
    
    manyvc
    
}

I narrowed the problem down to this line:

LMlike <- update(LMlike, design = colData(newsca))

In some, bootstraps I will get a warning like this:

Warning message:
In .nextMethod(object = object, value = value) :
  Coefficients zP6, zP8 are never estimible and will be dropped.

Of note:

  • P6 and P8 correspond to patients IDs
  • This message not always refers to exactly those two patient IDs
  • This message may include one or more patient IDs depending on the bootstrap

I looked at the distribution of the number of x and y values with:

k <- colData(newsca)
table(k$y, k$z)

But I fail to see an obvious patterns why those patient IDs fail.

Down the road, the problem is that the lengths of the rows of the manyvc object generated by the raply function have different size, and R complains about it and stops the execution.

Can anybody help with this?

Thank you in advance!

Javier

@gfinak
Copy link
Member

gfinak commented Apr 22, 2020

How many measurements do you have per subject? Particularly the subjects that fail in the cases where they fail?

@amcdavid
Copy link
Member

Thanks for the careful report. It looks to me like the dimension of the model matrix depends on the rows being subset. I guess we haven't run into this before because we haven't run bootstraps with "nearly" singular designs.

I can't immediately think of a good solution for this, which is a problem with the non-parametric bootstrap in a regression generally, I suppose. I guess we could drop those bootstrap replicates, or set their coefficients to NA. Though that will probably mean we underestimate the variability in some of the coefficients. At a minimum I'll see about triggering a more informative error...

@jaquol
Copy link
Author

jaquol commented May 5, 2020

How many measurements do you have per subject? Particularly the subjects that fail in the cases where they fail?

Thank you for your reply @gfinak. I checked if the subjects that failed had a lower number of observations compared to other subjects but they did not, and they had tens to hundreds of observations.

@gfinak
Copy link
Member

gfinak commented Nov 19, 2020

This may be related to a recent fix in #143. Would have been nice to have a reproducible example to test against.

@amcdavid
Copy link
Member

Sadly #143 didn't fix this. I am pushing a test that currently fails that isolates this behavior.

amcdavid added a commit that referenced this issue Nov 19, 2020
amcdavid added a commit that referenced this issue Nov 20, 2020
@amcdavid
Copy link
Member

@jaquol I know this is very delayed, but if you have a chance I'd love to hear if 314e4b9 fixed your issue and we can close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants