[R] how to control the sampling to make each sample unique

Vladimir Eremeev wl2776 at gmail.com
Thu May 10 11:40:17 CEST 2007



Urania Sun wrote:
> 
> I have a dataset of 10000 records which I want to use to compare two
> prediction models.
> 
> I split the records into test dataset (size = ntest) and training dataset
> (size = ntrain). Then I run the two models.
> 
> Now I want to shuffle the data and rerun the models. I want many shuffles.
> 
> I know that the following command
> 
> sample ((1:10000), ntrain)
> 
> can pick ntrain numbers from 1 to 10000. Then I just use these rows as the
> training dataset.
> 
> But how can I make sure each run of sample  produce different results? I
> want the data output be unique each time.
> I tested sample(). and found it usually produce different combinations.
> But
> can I control it some how? Is there a better way to write this?
> 
> Thank you,
> 
> 

You could have numbers, not picked yet, in a vector, use this vector with
sample and remove picked numbers from it iteratively.

Something like the following (not fully tested)

index<-1:10000

for( blah-blah-blah ) {
  train.index<-sample(index,ntrain)
  index<-index[!index %in% train.index]
  test.index<-sample(index,ntest)
  index<-index[!index %in% test.index]
}

-- 
View this message in context: http://www.nabble.com/how-to-control-the-sampling-to-make-each-sample-unique-tf3719058.html#a10410229
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list