[R] Selecting subsamples

Jim Lemon bitwrit at ozemail.com.au
Fri Dec 5 12:27:45 CET 2003


christian_mora at vtr.net wrote:
> Hi all,
> I?m working with a dataset with 9 columns and 2000 rows. Each row
> represents an individual and one of the columns represents the volume
> of that individual (measured in cubic meters). I?d like to select a
> sample from this dataset (without considering any probability of the
> rows) in which the sum of the volume of the individuals in that sample
> >= 100 cubic m. I?ll appreciate any suggestion Thanks CM
>
I think Petr has the right idea, but I'll suggest the following, which 
allows you to draw samples without replacement until you run out of rows.
Assume your data frame is call mydata.df and the volume variable is called 
"M3"

shuffled.rows<-sample(1:2000,2000)
rowindex<-0
volume.sum<-0
while(volume.sum < 100) {
 rowindex<-rowindex+1
 volume.sum<-volume.sum+mydata.df[shuffled.rows[rowindex],]$M3
}
this.sample<-mydata.df[shuffled.rows[1:rowindex],]

add another loop to collect as many samples as you need.

Jim




More information about the R-help mailing list