[R] Randomly sampling subsets of dataframe variable

Phil Spector spector at stat.berkeley.edu
Fri Mar 12 21:14:58 CET 2010


Mike -
    Perhaps these suggestions  will be helpful:

somedata = data.frame(week=rep(1:26,rep(5,26)),day=rep(1:5,26))

res = by(somedata,somedata$week,function(x)x[sample(1:nrow(x),2),])
do.call(rbind,res)

or

do.call(rbind,lapply(split(somedata,somedata$week),
               function(x)x[sample(1:nrow(x),2),]))

or

do.call(rbind,tapply(1:nrow(somedata),list(somedata$week),
                      function(x)somedata[sample(x,2),]))


 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu
On Fri, 12 Mar 2010, Hosack, Michael wrote:

> Fellow R users,
>
> I am stumped on what would seem to be something fairly simple.
> I have a dataframe that has a variable named 'WEEK' that takes
> the numbers 1:26 (26 week time-period) with each number repeated
> five times consecutively (once for each weekday, Monday through
> Friday). Ex. 111112222233333.....2626262626. I would like to
> randomly extract two weekdays per five day week for each of
> 26 weeks and store this data as a separate dataframe. I have
> been unable to get the sample function to work properly.
> I have also tried using the runif function to assign random
> numbers to each row of my dataframe, sort the dataframe first
> by week number then by random number value, and finally select
> the first two elements from each week subset (26 weeks total,
> giving 52 randomly selected values).  I can't figure out how
> to select the first two elements. My goal is to randomly
> select two weekdays per week (without replacement) for each of
> 26 consecutive weeks. Any advice would be greatly appreciated.
>
> Thank you,
>
> Mike
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list