[R] Sampling a dataframe based on the length of a subset of observations within

Eric Vander Wal eric.vanderwal at usask.ca
Thu Jul 9 03:19:23 CEST 2009

Thank you in advance for your consideration.

I have a dataframe of 2000+ observations with repeated measures across 
approximately 300 unique individuals  An event either does or does not 
(1,0) and there is a suit of independent variables associated with the 
event.  A simplified representation follows:

my.df<-data.frame("id"=c("A","A","A","B","B","C","C","C", "C", "C"), 
event=c(0,0,1,0,1,0,0,1,1, 0))

_id_  _event_
A     0
A     0
A     1
B     0
B     1
C     0
C     0
C     1
C     1
C     0

I need to sample my.df to select the same number of observations with 
event = 0 as event = 1 for each unique id.
I can reshape or tapply my.df to group id and determine what sample size 
I need.  my.df.cast=

my.df.melt<-melt(my.df, id="id")
my.df.cast<-cast(my.df.melt, id~value, length, fill=0)

_id_      _0_   _1_
A     2     *1*
B     1     *1*
C     3     *2*

Given the above dataframe I need to randomly select (sample) from my.df 
*one* observation from my.df[my.df$id==A & my.df$event==0],  *one* from 
my.df[my.df$id==B & my.df$event==0], and* two* from my.df[my.df$id==C & 
my.df$event==0] and then rbind them to my.df[my.df$event == 1].  
However, it is impractical to individually code each case.

Alternatively if A in my.df matches A in my.df.cast  then 
sample(my.df[my.df$id == A & my.df$event == 0], size=my.df.cast[1,3], 
replace=FALSE).  I think I am close to a solution but I'm not sure how 
to code it to run through the entire dataframe.

This is how my.new.df would look:

_id event_
A     0
A     1
B     0
B     1
C     0
C     0
C     1
C     1

Thank you kindly for your help,


Eric Vander Wal
Ph.D. Candidate
University of Saskatchewan, 
Department of Biology,
112 Science Place, 
Saskatoon, SK., S7N 5E2

"Pluralitas non est ponenda sine neccesitate"

More information about the R-help mailing list