[R] Strange results : bootrstrp CIs

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Sun Jan 14 01:54:01 CET 2024


Well, this would seem to work:

e <- data.frame(Score = Score
             , Country = factor(Country)
             , Time = Time)

ncountry <- nlevels(e$Country)
func= function(dat,idx) {
   if(length(unique(dat[idx,'Country'])) < ncountry) NA
   else coef(lm(Score~ Time + Country,data = dat[idx,]))
}
B <-  boot(e, func, R=1000)

boot.ci(B, index=2, type="perc")

Caveats:
1) boot.ci handles the NA's by omitting them, which of course gives a
smaller resample and longer CI's than the value of R specified in the call
to boot().

2) I do not know if the *nice* statistical properties of the nonparametric
bootstrap, e.g. asymptotic correctness, hold when bootstrap samples are
produced in this way.  I leave that to wiser heads than me.

Cheers,
Bert

On Sat, Jan 13, 2024 at 2:51 PM Ben Bolker <bbolker using gmail.com> wrote:

>    It took me a little while to figure this out, but: the problem is
> that if your resampling leaves out any countries (which is very likely),
> your model applied to the bootstrapped data will have fewer coefficients
> than your original model.
>
> I tried this:
>
> cc <- unique(e$Country)
> func <- function(data, idx) {
>     coef(lm(Score~ Time + factor(Country, levels =cc),data=data[idx,]))
> }
>
> but lm() automatically drops coefficients for missing countries (I
> didn't think about it too hard, but I thought they might get returned as
> NA and that boot() might be able to handle that ...)
>
>    If you want to do this I think you'll have to find a way to do a
> *stratified* bootstrap, restricting the bootstrap samples so that they
> always contain at least one sample from each country ... (I would have
> expected "strata = as.numeric(e$Country)" to do this, but it doesn't
> work the way I thought ... it tries to compute the statistics for *each*
> stratum ...)
>
>
>
> ====
>
>   Debugging attempts:
>
> set.seed(101)
> options(error=recover)
> B= boot(e, func, R=1000)
>
>
> Error in t.star[r, ] <- res[[r]] :
>    number of items to replace is not a multiple of replacement length
>
> Enter a frame number, or 0 to exit
>
> 1: boot(e, func, R = 1000)
>
> <enter 1>
>
> Selection: 1
> Called from: top level
> Browse[1]> print(r)
> [1] 2
> Browse[1]> t.star[r,]
> [1] NA NA NA NA NA NA NA NA NA
>
> i[2,]
>   [1] 14 15 22 22 21 14  8  2 12 22 10 15  9  7  9 13 12 23  1 20 15  7
> 5 10
>
>
>
>
> On 2024-01-13 5:22 p.m., varin sacha via R-help wrote:
> > Dear Duncan,
> > Dear Ivan,
> >
> > I really thank you a lot for your response.
> > So, if I correctly understand your answers the problem is coming from
> this line:
> >
> > coef(lm(Score~ Time + factor(Country)),data=data[idx,])
> >
> > This line should be:
> > coef(lm(Score~ Time + factor(Country),data=data[idx,]))
> >
> > If yes, now I get an error message (code here below)! So, it still does
> not work.
> >
> > Error in t.star[r, ] <- res[[r]] :
> >    number of items to replace is not a multiple of replacement length
> >
> >
> > ##########################################
> >
> Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345,
> 345,456,543,501)
> >
> > Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey",
> "USA", "USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal",
> "Portugal", "UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria",
> "Austria", "Austria")
> >
> > Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
> >
> > e=data.frame(Score, Country, Time)
> >
> >
> > library(boot)
> > func= function(data, idx) {
> > coef(lm(Score~ Time + factor(Country),data=data[idx,]))
> > }
> > B= boot(e, func, R=1000)
> >
> > boot.ci(B, index=2, type="perc")
> > #############################################################
> >
> >
> >
> >
> >
> >
> >
> >
> > Le samedi 13 janvier 2024 à 21:56:58 UTC+1, Ivan Krylov <
> ikrylov using disroot.org> a écrit :
> >
> >
> >
> >
> >
> > В Sat, 13 Jan 2024 20:33:47 +0000 (UTC)
> >
> > varin sacha via R-help <r-help using r-project.org> пишет:
> >
> >> coef(lm(Score~ Time + factor(Country)),data=data[idx,])
> >
> >
> > Wrong place for the data=... argument. You meant to give it to lm(...),
> > but in the end it went to coef(...). Without the data=... argument, the
> > formula passed to lm() picks up the global variables inherited by the
> > func() closure.
> >
> > Unfortunately, S3 methods really do have to ignore extra arguments they
> > don't understand if the class is to be extended, so coef.lm isn't
> > allowed to complain to you about it.
> >
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list