[R] understanding behavior of "merge"

Greg Snow Greg.Snow at imail.org
Sat May 1 01:19:43 CEST 2010


So does each person have multiple rows and you want to sample the set of rows?

The usual approach that I take is to split them into a list, sample from the list, the put the list back together, for example:


tmp1 <- split(as.data.frame(state.x77), state.division)
tmp2 <- sample(tmp1, replace=TRUE)
tmp3 <- do.call( 'rbind', tmp2 )
tmp3$newid <- rep( 1:length(tmp2), sapply(tmp2, nrow) )

wrap that in a function and you have the bootstrap resampling.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Chris__ Barker
> Sent: Thursday, April 29, 2010 5:26 PM
> To: r-help at r-project.org
> Subject: [R] understanding behavior of "merge"
> 
> I'm trying to bootstrap resample from a repeated measures dataset. I
> sample
> a vector of "ID"'s from my dataframe with replacement.
> Then I merge this back with my dataframe.
> I'm re-sampling subjects in the dataset rather than rows of the data.
> 
> I thought I could use the left/right join features of the merge to
> select
> the records I want from the dataframe (mydataframe), like this.
> 
> boot.sample <-  merge(  id.boot.draw,mydataframe, by=c("ID") ,
> all.x=TRUE )
> 
> 
> when I do that, the correct records are selected from "mydataframe" but
> the
> values for all the variables, other than the matching variable are now
> "NA".
> 
> My other option is to right a for loop, which I would hope to avoid.
> 
> Thanks in advance for any suggestions
> 
> --
>               Chris Barker,
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list