[R] Strange results : bootrstrp CIs

Ben Bolker bbo|ker @end|ng |rom gm@||@com
Sat Jan 13 23:51:24 CET 2024


   It took me a little while to figure this out, but: the problem is 
that if your resampling leaves out any countries (which is very likely), 
your model applied to the bootstrapped data will have fewer coefficients 
than your original model.

I tried this:

cc <- unique(e$Country)
func <- function(data, idx) {
    coef(lm(Score~ Time + factor(Country, levels =cc),data=data[idx,]))
}

but lm() automatically drops coefficients for missing countries (I 
didn't think about it too hard, but I thought they might get returned as 
NA and that boot() might be able to handle that ...)

   If you want to do this I think you'll have to find a way to do a 
*stratified* bootstrap, restricting the bootstrap samples so that they 
always contain at least one sample from each country ... (I would have 
expected "strata = as.numeric(e$Country)" to do this, but it doesn't 
work the way I thought ... it tries to compute the statistics for *each* 
stratum ...)



====

  Debugging attempts:

set.seed(101)
options(error=recover)
B= boot(e, func, R=1000)


Error in t.star[r, ] <- res[[r]] :
   number of items to replace is not a multiple of replacement length

Enter a frame number, or 0 to exit

1: boot(e, func, R = 1000)

<enter 1>

Selection: 1
Called from: top level
Browse[1]> print(r)
[1] 2
Browse[1]> t.star[r,]
[1] NA NA NA NA NA NA NA NA NA

i[2,]
  [1] 14 15 22 22 21 14  8  2 12 22 10 15  9  7  9 13 12 23  1 20 15  7 
5 10




On 2024-01-13 5:22 p.m., varin sacha via R-help wrote:
> Dear Duncan,
> Dear Ivan,
> 
> I really thank you a lot for your response.
> So, if I correctly understand your answers the problem is coming from this line:
> 
> coef(lm(Score~ Time + factor(Country)),data=data[idx,])
> 
> This line should be:
> coef(lm(Score~ Time + factor(Country),data=data[idx,]))
> 
> If yes, now I get an error message (code here below)! So, it still does not work.
> 
> Error in t.star[r, ] <- res[[r]] :
>    number of items to replace is not a multiple of replacement length
> 
> 
> ##########################################
> Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345, 345,456,543,501)
>   
> Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey", "USA", "USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal", "Portugal", "UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria", "Austria", "Austria")
>   
> Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
>   
> e=data.frame(Score, Country, Time)
>   
>   
> library(boot)
> func= function(data, idx) {
> coef(lm(Score~ Time + factor(Country),data=data[idx,]))
> }
> B= boot(e, func, R=1000)
>   
> boot.ci(B, index=2, type="perc")
> #############################################################
> 
> 
> 
> 
> 
> 
> 
> 
> Le samedi 13 janvier 2024 à 21:56:58 UTC+1, Ivan Krylov <ikrylov using disroot.org> a écrit :
> 
> 
> 
> 
> 
> В Sat, 13 Jan 2024 20:33:47 +0000 (UTC)
> 
> varin sacha via R-help <r-help using r-project.org> пишет:
> 
>> coef(lm(Score~ Time + factor(Country)),data=data[idx,])
> 
> 
> Wrong place for the data=... argument. You meant to give it to lm(...),
> but in the end it went to coef(...). Without the data=... argument, the
> formula passed to lm() picks up the global variables inherited by the
> func() closure.
> 
> Unfortunately, S3 methods really do have to ignore extra arguments they
> don't understand if the class is to be extended, so coef.lm isn't
> allowed to complain to you about it.
>



More information about the R-help mailing list