[R] Sampling with Constraints for testing and training data

Petr Savicky savicky at cs.cas.cz
Wed Jan 25 16:17:48 CET 2012


On Wed, Jan 25, 2012 at 04:00:27AM -0800, Eliano wrote:
> Hi People, 
> 
> Does anyone have a good solution for this problem: 
> 
> a database called DB. 
> 
> 
> index <- sample(1:nrow(DB), size=0.2*nrow(BD)) 
> test <- DB[index,] 
> train <- DB[-index,] 
> 
> One of the variables in this database contais a target variable with two
> values 0 and 1. 
> 
> Imagine now that i want to constraint the test data frame so the 20% of the
> size of "test" has 50% of DB$target. 
> 
> Imagine: n=100 
> DB$target = { 0=80 
>                            1=20} 
> 
> test=20 and contain 10 random values of DB$target=1 and 10 random values of
> DB$target=0. 

Hi.

One way is as follows.

  t0 <- which(DB$target==0)
  t1 <- which(DB$target==1)
  m <- round(0.1*nrow(DB))
  stopifnot(length(t0) >= m & length(t1) >= m)
  index <- c(sample(t0, size=m), sample(t1, size=m))

Hope this helps.

Petr Savicky.



More information about the R-help mailing list