[R] sampling from data.frame

Charles C. Berry cberry at tajo.ucsd.edu
Wed Dec 3 02:19:12 CET 2008


On Wed, 3 Dec 2008, axionator wrote:

> Hi all,
> I have a data frame with "clustered" rows as follows:
> Cu1  x1 y1 z1 ...
> Cu1  x2 y2 z2 ...
> Cu1  x3 y3 z3 ... # end of first cluster Cu1
> Cu2  x4 y4 z4 ...
> Cu2  x5 y5 z5
> Cu2  ...               # end of second cluster Cu2
> Cu3 ...
> ...
> "cluster"-size is 3 in the example above (rows making up a cluster are
> always consecutive). Is there any faster way to sample n clusters
> (with replacement) from this dataframe and build up a new data frame
> out of these sampled clusters? I use the "sample" function and a
> for-loop.

Something like this:

cl.samps <- sample( split( df, df$cluster ), n.samps, repl=TRUE )

do.call( rbind, cl.samps )

If you need to identify the samples from which the rows came (versus just 
the originating clusters):

cl.samps2 <- lapply( seq(along=cl.samps),
 	function(x) cbind( cl.samps[[ x ]], new.cluster = x ) )

do.call( rbind, cl.samps2 )

HTH,

Chuck

>
> Thanks in advance
> Armin
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the R-help mailing list