[R] Sampling problems

David Winsemius dwinsemius at comcast.net
Wed Mar 7 21:24:52 CET 2012


On Mar 7, 2012, at 11:41 AM, Oritteropus wrote:

> Hi,
> I need to sample randomly my dataset for 1000 times. The sample need  
> to be
> the 80%. I know how to do that, my problem is that not only I need  
> the 80%,
> but I also need the corresponding 20% each time. Is there any way to  
> do
> that?
> Alternatively, I was thinking to something like setdiff () function to
> compare my 80% sample to the original dataset and obtain the  
> corresponding
> 20%, unfortunately setdiff works just for vectors, do you know a  
> similar
> function for dataframes?

Create an index vector with runif or sample and then use that to get  
you sample and use negative indexing to get the remainder.

idx <- sample(1:1000, 800)
x[ idx, ]  # 80%
x[ -idx, ] # the other 20%

(I think this does presume you have not mucked with the default  
rownames.)


-- 

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list