-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
standardized residuals return 'inf' #16
Comments
The reason is that the residuals are huge - about 8.4 to 10.5 - and are computed as quantile residuals. Due to numerical instabilities in the computation, these collapse to infinity here. As your model is essentially a standard linear regression model, the quantile residuals from
So we can get a grasp of what the infinite residuals should be:
And the way that
And this explains the warning in In this particular case we don't need any randomization in the quantile residuals and thus it would be possible to switch to the log-probability scale:
But I'm not sure how easy it would be to switch the computations inside the But maybe either Mikis @mstasinopoulos or Niki @freezenik have an idea to avoid this. |
This all makes sense to me, and I appreciate the explanation. From a software perspective, it would probably be good to have either an analytical 'solution' (log.p, potentially) or an informative failure and Warning. |
In
The reason that I'm only doing this in case of infinite residuals (rather than by default) is that I'm not sure that all Mikis @mstasinopoulos and Niki @freezenik, if |
Hi Achim
I am almost sure that all cdf’s in `gamlss.dist` have the log.p argument so we should use the p.log trick to calculated residuals
Thanks
Mikis
… On 13 Aug 2024, at 23:00, Achim Zeileis ***@***.***> wrote:
qnorm(pnorm(9, log.p = TRUE), log.p = TRUE)
|
Thank you! I just added Achim's @zeileis code snipped and it seems to work. |
Thank you both for the follow-up! I had a quick look at both
|
Yes, you are right, this is a bit puzzling! The reason why I use rqres() in residuals.gamlss2() is because how some families are implemented in gamlss.dist, e.g., the implementation of the binomial denominator. With the "new" gamlss2 family setup, you don't need it. At the moment, I am not sure what the best solution is?! |
Dear Niki and Achim
After our emails last yesterday I had a second thought about log.p in gamlss.dist.
I am almost sure that its implementation for the `p ` function is probably correct
I feel that its implementation in the `q ` function needs checking.
Mind you this will not effect
rqres <- qnorm(cdf(q = y, ..., log.p = TRUE), log.p = TRUE)
Since `qnorm` is OK
We need to think what is appropriate for `randomised` quantile residuals
Those day I call quantile residuals the `z-scores` but I am not sure if we should adopt the name or not.
The rqres() function is used also in other functions of `gamlss` related packages but it is used through the body() function
i.e.
body(rqres) <- eval(quote(body(rqres)), envir = getNamespace("gamlss”))
Which mean that we have to change `rqres()` only once
Here is some example of the `qnorm(cdf(q = y, ..., log.p = TRUE), log.p = TRUE)` paradigm
qnorm(pTF(q = 1000, log.p = TRUE), log.p = TRUE)
[1] 10.61788
qnorm(pBCT(q = 1000, log.p = TRUE), log.p = TRUE)
[1] 5.154898
qnorm(pBCPE(q = 1000, log.p = TRUE), log.p = TRUE)
[1] Inf
qnorm(pBCCG(q = 1000, log.p = TRUE), log.p = TRUE)
[1] Inf
qnorm(pPO(q = 1000, log.p = TRUE), log.p = TRUE)
[1] Inf
qnorm(pSI(q = 1000, log.p = TRUE), log.p = TRUE)
[1] 7.679371
It looks that occasionally will get `Inf`.
Is that OK for graphs? what happens to graphs if encounter `Inf`?
NA usually is OK because are omitted.
All this discussion about residuals remind me something else I would like for `gamlss2` which I think it will be very useful
resid( , newdata)
I think it will be very useful to have the residuals for out of bag data.
Thanks
Mikis
… On 15 Aug 2024, at 23:36, Achim Zeileis ***@***.***> wrote:
Thank you both for the follow-up! I had a quick look at both gamlss and gamlss2 but was confused how all the different computations exactly play together. Hence it would be great if you could both have another look at this.
First, I think that in the rqres() function in gamlss the line https://github.com/gamlss-dev/gamlss/blob/main/R/rqres.R#L19 could be changed to
rqres <- qnorm(cdf(q = y, ..., log.p = TRUE), log.p = TRUE)
if indeed all p... functions have a log.p argument.
Then this rqres() computation seems to be inherited in various places - but I didn't check whether changes are needed in other places as well.
Even gamlss2 seems to inherit this computation in https://github.com/gamlss-dev/gamlss2/blob/main/R/families.R#L526 which seems somewhat quick & dirty to me, given that gamlss2 is only a Suggests dependency.
It's good to have Niki's new fix in https://github.com/gamlss-dev/gamlss2/blob/main/R/residuals.R#L119-L121 as an additional fallback. But this is only applicable like this in the continuous case without randomization.
—
Reply to this email directly, view it on GitHub <#16 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE7QN7SARUPYHEMWGS27UQLZRUUQHAVCNFSM6AAAAABMI3NP5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJSGQYDKOJTGQ>.
You are receiving this because you were mentioned.
|
All good points, thanks. Some further comments:
|
Niki @freezenik let's talk in person about what the best steps are for |
sample_dat.csv
It is unclear what is going on but when using some of the attached sample data with a very basic linear model (as well as more complex models not listed here), the standardized residuals are returning 'inf'. As best I can determine, they are all observations where the raw residuals are greater than 0.5191506. Is this a software bug or some sort of expected behavior I've been unable to diagnose?
Here is the basic model:
g1_p1_adj <- gamlss(pace1 ~ sex + age + adj_tot_time, data = sample_dat, family = 'NO')
sum(is.infinity(resid(g1_p1_adj)))
This obviously makes for errors in both the worm plots and the plot.gamlss() method.
*Edit: note that this issue does not occur in the developmental gamlss2 but there is a warning about " non finite quantiles from probabilities, set to NA!" ... so I am guessing there is some sort of 'fix' already developed? but I'd still be interested to know the cause of the 'non finite quantiles'
The text was updated successfully, but these errors were encountered: