# [R] boot with strata: strata argument ignored?

Bryan Hanson hanson at depauw.edu
Sat Jun 26 21:17:05 CEST 2010

Thanks Chuck, I understand much better what is going on with your example.
But I'm still uncertain why the b2\$t array does not have the dimensions of R
x no. of strata.

Any further insight would be appreciated.  Bryan
*************
Bryan Hanson
Acting Chair
Professor of Chemistry & Biochemistry
DePauw University, Greencastle IN USA

On 6/26/10 12:43 PM, "Charles C. Berry" <cberry at tajo.ucsd.edu> wrote:

> On Sat, 26 Jun 2010, Bryan Hanson wrote:
>
>> Hello All.  I must be missing the really obvious here:
>>
>> mm <- function(d, i) median(d[i])
>> b1 <- boot(gravity\$g, mm, R = 1000)
>> b1
>> b2 <- boot(gravity\$g, mm, R = 1000, strata = gravity\$series)
>> b2
>>
>> Both b1 and b2 seem to have done (almost) the same thing, but it looks like
>> the strata argument in b2 has been ignored.  However, str(b1) vs str(b2)
>> does show that the strata have been noted correctly.  But b2\$t is a 1000 x 1
>> array, not a 1000 x 8 array (gravity\$series is a factor with 8 levels).
>>
>> There is a more complex example in ?boot using the same data set that gives
>> a result that seems to make sense (2 levels in the factor, so \$t has 2
>> columns).
>>
>> I either misunderstand the expected behavior or I've missed some punctuation
>> or syntax detail.
>
> Your punctuation and syntax is OK.
>
> Note:
>
>> SISWR <- function(x) sample(x,length(x),repl=TRUE)
>> # no strata
>> var(replicate(1000,median(SISWR(gravity\$g))))
> [1] 0.4588338
>> # now stratify on series
>> gsplit <- split(gravity\$g,gravity\$series)
>> var(replicate(1000,median(unlist(lapply(gsplit,SISWR)))))
> [1] 0.3882272
>>
>> sqrt(.45) # this agrees  with b1
> [1] 0.6708204
>> sqrt(.39) # this agrees with b2
> [1] 0.6244998
>>
>
> The effect of stratification depends on the relative amount of variation
> within vs between strata. This suggests there is not a lot:
>
>> aov(g~series,gravity)
> Call:
>     aov(formula = g ~ series, data = gravity)
>
> Terms:
>                    series Residuals
> Sum of Squares  2818.624  8239.376
> Deg. of Freedom        7        73
>
> Residual standard error: 10.62394
> Estimated effects may be unbalanced
>>
>
>
> HTH,
>
> Chuck
>
>>
>> TIA, Bryan
>>
>> *************
>> Bryan Hanson
>> Acting Chair
>> Professor of Chemistry & Biochemistry
>> DePauw University, Greencastle IN USA
>>
>>> sessionInfo()
>> R version 2.11.0 (2010-04-22)
>> x86_64-apple-darwin9.8.0
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] datasets  tools     grid      graphics  grDevices utils     stats
>> [8] methods   base
>>
>> other attached packages:
>> [1] boot_1.2-42        brew_1.0-3         faraway_1.0.4
>> [4] GGally_0.2         xtable_1.5-6       mvbutils_2.5.1
>> [7] ggplot2_0.8.7      digest_0.4.2       reshape_0.8.3
>> [10] proto_0.3-8        ChemoSpec_1.43     R.utils_1.4.0
>> [13] R.oo_1.7.2         R.methodsS3_1.2.0  rgl_0.91
>> [16] lattice_0.18-5     mvoutlier_1.4      plyr_0.1.9
>> [19] RColorBrewer_1.0-2 chemometrics_0.8   som_0.3-5
>> [22] robustbase_0.5-0-1 rpart_3.1-46       pls_2.1-0
>> [25] pcaPP_1.8-1        mvtnorm_0.9-9      nnet_7.3-1
>> [28] mclust_3.4.4       MASS_7.3-5         lars_0.9-7
>> [31] e1071_1.5-23       class_7.3-2
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> Charles C. Berry                            (858) 534-2098
>                                              Dept of Family/Preventive
> Medicine
> E mailto:cberry at tajo.ucsd.edu             UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
>
>