[R] vectorizing: selecting one record per group

Erik Iverson eriki at ccbr.umn.edu
Wed Oct 13 22:17:16 CEST 2010


Hello,

There are probably many ways to do this, but I think
it's easier if you use a data.frame as your object.

The easy solution for the matrix you provide is escaping
me at the moment.

One solution, using the plyr package:


library(plyr)
A <- data.frame(a = rnorm(100),b = runif(100), c = rep(c(1,2,3,4,5),20))
ddply(A, .(c), function(x) x[sample(1:nrow(x), 1), ])

             a         b c
1  0.02995847 0.4763819 1
2  0.72035194 0.2948611 2
3  1.34963917 0.2057488 3
4 -1.99427160 0.1147923 4
5 -0.73612703 0.5889539 5


Mauricio Romero wrote:
> Hi,
> 
>  
> 
> I want to select a subsample from my data, choosing one record from each
> group. I know how to do this with a for.
> 
>  
> 
> For example: lets say I have the data:
> 
> A=cbind(rnorm(100),runif(100),(rep(c(1,2,3,4,5),20)))
> 
> Where the third column is the group variable. Then what I want is to select
> 5 observations. Each one taken randomly from each group.
> 
>  
> 
>  
> 
> INDEX =NULL
> 
> i=1
> 
> for(index_g in  unique(A[,3])){
> 
> INDEX [i]=sample(which(A[,3]==index_g),1)
> 
> i=i+1
> 
> }
> 
> SEL=A[INDEX,]
> 
>  
> 
>  
> 
> Is there a way to do this without a “for”? in other words is there a way to
> “vectorize” this?
> 
>  
> 
> Thank you,
> 
>  
> 
>  
> 
> Mauricio Romero 
> 
> Quantil S.A.S.
> 
> Bogotá,Colombia
> 
> www.quantil.com.co
> 
>  
> 
> "It is from the earth that we must find our substance; it is on the earth
> that we must find solutions to the problems that promise to destroy all life
> here"
> 
>  
> 
> 
> 	[[alternative HTML version deleted]]
> 
> 
> 
> ------------------------------------------------------------------------
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list