[R] simple randomization question: How to perform "sample" in chunks

David Winsemius dwinsemius at comcast.net
Thu Aug 20 18:58:37 CEST 2009


On Aug 20, 2009, at 11:22 AM, Tal Galili wrote:

> Hello dear R-help group.
>
> My task looks simple, but I can't seem to find a "smart" (e.g: non  
> loop)
> solution to it.
>
> Task: I wish to randomize a data.frame by one column, while keeping  
> the
> inner-order in the second column as is.
>
> So for example, let's say I have the following data.frame:
>
> xx <-data.frame(a=  c(1,2,2,3,3,3,4,4,4,4) ,
>                        b =  c(1,1,2,1,2,3,1,2,3,4) )
>
> I would like to shuffle it by column "a", while keeping the order in  
> column
> "b".
>
> Here is my "not-smart" way of doing it:
>
> # R example
> xx <-data.frame(a=  c(1,2,2,3,3,3,4,4,4,4) ,
>                        b =  c(1,1,2,1,2,3,1,2,3,4) )
>
> randomize.by.column.a <- function(xx)
> {
> new.a.order <- sample(unique(xx$a))
> new.xx <- NULL
> for(i in new.a.order)
> {
>  xx.subset <- xx[ xx$a %in% i ,]
>  new.xx <- rbind(new.xx ,  xx.subset)
> }
>
> return(new.xx)
> }
> randomize.by.column.a(xx)
> # END of - R example
>

It was a bit confusing to read that you wanted to "keep the order in  
column "b"", but your code implies that you wanted to carry the b- 
values along with the sorted a-values. I think this achieves the same  
goal:

xx[sample(1:nrow(xx)), ]

-- 

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list