[R] how to control the sampling to make each sample unique

Rory Martin rory.martin at comcast.net
Thu May 10 15:09:43 CEST 2007


I think you're asking a design question about a Monte Carlo simulation.  You
have a "population" (size 10,000) from which you're defining an empirical
distribution, and you're sampling from this to create pairs of training and
test samples.

You need to ensure that each specific pair of training and test samples is
disjoint, meaning no observations in common.  Normally, you wouldn't want to
make the different training samples disjoint, if that's what you meant by
them being "unique".  Or were you using it to mean "identical"?

Regards
Rory Martin


> From: HelponR <suncertain_at_gmail.com> Date: Wed, 09 May 2007 17:28:19
>
> I have a dataset of 10000 records which I want to use to compare two
> prediction models.
>
> I split the records into test dataset (size = ntest) and training dataset
> (size = ntrain). Then I run the two models.
>
> Now I want to shuffle the data and rerun the models. I want many shuffles.
>
> I know that the following command
>
> sample ((1:10000), ntrain)
>
> can pick ntrain numbers from 1 to 10000. Then I just use these rows as the
> training dataset.
>
> But how can I make sure each run of sample produce different results? I
> want the data output be unique each time. I tested sample(). and found it
> usually produce different combinations. But can I control it some how? Is
> there a better way to write this?



More information about the R-help mailing list