[R] Randomly interleaving data frames while preserving order

Nordlund, Dan (DSHS/RDA) NordlDJ at dshs.wa.gov
Tue Mar 31 20:06:48 CEST 2015


> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Kevin
> E. Thorpe
> Sent: Tuesday, March 31, 2015 10:53 AM
> To: Duncan Murdoch
> Cc: R Help Mailing List
> Subject: Re: [R] Randomly interleaving data frames while preserving
> order
> 
> On 03/31/2015 01:44 PM, Duncan Murdoch wrote:
> > On 31/03/2015 1:05 PM, Kevin E. Thorpe wrote:
> >> Hello.
> >>
> >> I am trying to simulate recruitment in a randomized trial. Suppose I
> >> have three streams (strata) of patients represented by these data
> frames.
> >>
> >> df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010)
> >> df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010)
> >> df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010)
> >>
> >> What I need to do is construct a data frame with all of these
> combined
> >> where the order of selection from one of the three data frames is
> >> randomized but once a stratum is selected patients are selected
> >> sequentially from that data frame.
> >>
> >> To see what I'm looking to achieve, suppose the first five subjects
> were
> >> to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The
> >> expected result should look like this:
> >>
> >> rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,])
> >>      strat id  pid
> >> 1      1  1 1001
> >> 2      2  1 2001
> >> 21     1  2 1002
> >> 4      3  1 3001
> >> 22     2  2 2002
> >>
> >> I hope what I'm trying to accomplish makes sense. Maybe I'm missing
> >> something obvious, but I really have no idea at the moment how to
> >> achieve this elegantly. Since I need to simulate many trial
> recruitments
> >> it needs to be general and compact.
> >>
> >> I appreciate any advice.
> >
> > How about something like this:
> >
> > # Permute an ordered vector of selections:
> > sel <- sample(c(rep(1, nrow(df1)), rep(2, nrow(df2)), rep(3,
> nrow(df3))))
> >
> > # Create an empty dataframe to hold the results
> > df <- data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),]
> >
> > # Put the original dataframes into the appropriate slots:
> > df[sel == 1,] <- df1
> > df[sel == 2,] <- df2
> > df[sel == 3,] <- df3
> >
> > # Clean up the rownames
> > rownames(df) <- NULL
> >
> > Duncan Murdoch
> >
> 
> Thanks Duncan.
> 
> Once you see the solution it is indeed obvious.
> 
> Kevin
> 
> --
> Kevin E. Thorpe
> Head of Biostatistics,  Applied Health Research Centre (AHRC)
> Li Ka Shing Knowledge Institute of St. Michael's
> Assistant Professor, Dalla Lana School of Public Health
> University of Toronto
> email: kevin.thorpe at utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016
> 

Another option would be to stack your strata and then sample from the combined data frame, something like this:

sample_size <- 10
population <- rbind(df1,df2,df3)
sim.sample <- pop[sample(nrow(pop),sample_size, replace=FALSE),]

Hope this is helpful,

Dan

Daniel J. Nordlund, PhD
Research and Data Analysis Division
Services & Enterprise Support Administration
Washington State Department of Social and Health Services




More information about the R-help mailing list