[R] bootstrapping respecting subject level information

Fri Jul 5 02:48:19 CEST 2013

I think that in the case of a 2*2 balanced, replicated design such as
this one, interpreting the interaction should be safe.

Cheers

Andrew

On Fri, Jul 5, 2013 at 9:38 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Jul 3, 2013, at 7:19 PM, Sol Lago wrote:
>
>> Hi there,
>>
>> This is the first time I use this forum, and I want to say from the start I am not a skilled programmer. So please let me know if the question or code were unclear!
>>
>> I am trying to bootstrap an interaction (that is my test statistic) using the package "boot". My problem is that for every resample, I would like the randomization to be done within subjects, so that observations from different subjects are not mixed. Here is the code to generate a dataframe similar to mine:
>>
>> Subject = rep(c("S1","S2","S3","S4"),4)
>> Num     = rep(c("singular","plural"),8)
>> Gram    = rep(c("gram","gram","ungram","ungram"),4)
>> RT      = c(657,775,678,895,887,235,645,916,930,768,890,1016,590,978,450,920)
>> data    = data.frame(Subject,Num,Gram,RT)
>>
>> This is the code I used to get the empirical interaction value:
>>
>> summary(lm(RT ~ Num*Gram, data=data))
>>
>> As you can see, the interaction between my two factors is -348.
>
> That depends on what you mean by "the interaction between my two factors". It is almost never a good idea to attempt interpretation of interaction coefficients, and is always preferable to check the predictions of hte model.
>
>> I want to get a bootstrap confidence interval for this statistic, which I can generate using the "boot" package:
>>
>> #Function to create the statistic to be boostrapped
>> boot.huber <- function(data, indices) {
>> data <- data[indices, ] #select obs. in bootstrap sample
>> mod <- lm(RT ~ Num*Gram, data=data)
>> coefficients(mod)       #return coefficient vector
>> }
>>
>> #Generate bootstrap estimate
>> data.boot <- boot(data, boot.huber, 1999)
>>
>> #Get confidence interval
>> boot.ci(data.boot, index=4, type=c("norm", "perc", "bca"),conf=0.95) #4 gets the CI for the interaction
>>
>> My problem is that I think the resamples should be generated without mixing the individual subjects observations: that is, to generate the new resamples, the observations from subject 1 (S1) should be shuffled within subject 1,
>
> What does that mean?
>
>> not mixing them with the observations from subjects 2, etc... I don't know how "boot" is doing the resampling (I read the documentation but don't understand how the function is doing it)
>
> It's doing it by selecting randomly entire rows. It is not "shuffling within rows" for selected subjects.
>>
>> Does anyone know how I could make sure that the resampling procedure used by "boot" respects the subject level information?
>
> It would be doing so because that is the way you set up the indexing. The column ordering tof the data within subjects is not permuted.
>
> I do think you are beyond your understanding of the statstical principles that you are attempting to use and would be safer to consult with a statsitician.
>
>
> --
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Andrew Robinson
Deputy Director, CEBRA
Senior Lecturer in Applied Statistics                      Tel: +61-3-8344-6410
Department of Mathematics and Statistics            Fax: +61-3-8344 4599
University of Melbourne, VIC 3010 Australia
Email: a.robinson at ms.unimelb.edu.au    Website: http://www.ms.unimelb.edu.au

FAwR: http://www.ms.unimelb.edu.au/~andrewpr/FAwR/
SPuR: http://www.ms.unimelb.edu.au/spuRs/