You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello package maintainers!
I am building confidence intervals for groups with bootstrapped values and I'm having trouble creating multiple re-sampled datasets from which to build my confidence intervals.
That should give me 344 * 10 = 3440 lines in the full new data set. This is true, but when you look at the data you can see that each replicate has a different number of observations. For all of the Adelie, n per sample should be 152, chinstrap should be 68, and Gentoo should be 124. Instead we find this:
I'm not one of the package maintainers, but your question links to a question I was considering this weekend to put up here.
Let's say I have a dataset which is rather unbalanced with regards to the explanatory variable and I draw bootstrap samples from this dataset. I could end up with many bootstrap samples which contain no cases from the minority class. If I then want to calculate a (for example) diff in props statistic from these samples I end up with many NaN values. I can easily drop these NaN samples from my analyses, in fact, the get_ci and visualise functions do this automatically, but is makes me wonder if a stratified argument would be useful for the generate function.
I hope the package maintainers or authers could weight in on the question above and my related question.
Hello package maintainers!
I am building confidence intervals for groups with bootstrapped values and I'm having trouble creating multiple re-sampled datasets from which to build my confidence intervals.
Using the palmerpenguins library as an example:
There are 344 total observations and each species has a different number of observations:
I want to be able to group by the species, and for each species pull multiple samples while using the original number of observations per each group.
That should give me 344 * 10 = 3440 lines in the full new data set. This is true, but when you look at the data you can see that each replicate has a different number of observations. For all of the Adelie, n per sample should be 152, chinstrap should be 68, and Gentoo should be 124. Instead we find this:
What am I missing?
thanks for your insight.
The text was updated successfully, but these errors were encountered: