[R] how to control the sampling to make each sample unique

Vladimir Eremeev wl2776 at gmail.com
Thu May 10 11:40:17 CEST 2007

Urania Sun wrote:
> I have a dataset of 10000 records which I want to use to compare two
> prediction models.
> I split the records into test dataset (size = ntest) and training dataset
> (size = ntrain). Then I run the two models.
> Now I want to shuffle the data and rerun the models. I want many shuffles.
> I know that the following command
> sample ((1:10000), ntrain)
> can pick ntrain numbers from 1 to 10000. Then I just use these rows as the
> training dataset.
> But how can I make sure each run of sample  produce different results? I
> want the data output be unique each time.
> I tested sample(). and found it usually produce different combinations.
> But
> can I control it some how? Is there a better way to write this?
> Thank you,

You could have numbers, not picked yet, in a vector, use this vector with
sample and remove picked numbers from it iteratively.

Something like the following (not fully tested)


for( blah-blah-blah ) {
  index<-index[!index %in% train.index]
  index<-index[!index %in% test.index]

View this message in context: http://www.nabble.com/how-to-control-the-sampling-to-make-each-sample-unique-tf3719058.html#a10410229
Sent from the R help mailing list archive at Nabble.com.

More information about the R-help mailing list