[R] Error: cannot take a sample larger than the population

chao gai chaogai at duineveld.demon.nl
Sat Dec 30 17:57:21 CET 2006


Aldi,

Your concept of sample is different from mine. 
I would expect with replacement to be equivalent for a for loop of sampling 
without replacement.
samples <- 1:400
for (i in 1:400) samples[i] <- sample(c(0,1,2),1 ,prob=c(0.02 ,0.93 ,0.05 ))
Sampling without replacement:
first :  sample(c(0,1,2),1 ,prob=c(0.02 ,0.93 ,0.05 ))
second: depending on first (suppose 2 was selected)
	 sample(c(0,1),1 ,prob=c(0.02 ,0.93)/.95)
third: whatever is remaining with probability 1.

n.b. the second is equivalent to  sample(c(0,1),1 ,prob=c(0.02 ,0.93)), since 
sample normalized the probabilities itself.

Concerning your result:
observed <- c(0.0200, 0.9225, 0.0575 )*400
expected  <- c(0.02 ,0.93 ,0.05 )*400
stat <- sum((observed-expected)^2/expected)
pchisq(stat,2,lower=FALSE)
[1] 0.788915

Seems ok to me.

Cheers,
Kees



On Saturday 30 December 2006 16:55, Aldi Kraja wrote:
> Partial Summary and discussion:
> =====================
> Thank you to Chao Gai, Chuck Cleland, and Jim Lemon for their suggestion
> to use replace=T in R.
> There is a problem though (see below)
>
> In the Splus7, sample is defined as
> -------------
> sample(x, size = n, replace = F, prob = NULL, n = NULL, ...)  where
> replace=F
> In Splus7
>
> xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 ))
>
> and the
>
> table(xlrmN1)/400
>     0    1    2
>  0.02 0.93 0.05
> show that "sample" is working exactly as expected based on the prob vector.
>
> When "sample" is used in Splus7 with replacement we see the following
>
> result:
>  > xlrmN1 <- sample(c(0,1,2),400 ,replace=T,prob=c(0.02 ,0.93 ,0.05 ))
>  > table(xlrmN1)/400
>
>       0     1      2
>  0.0125 0.925 0.0625
> which I think is working again as expected.
>
> In the R, sample is defined as
> ---------
>
> sample(x, size, replace = FALSE, prob = NULL)
>
> So the above statement with replace=F did not work (reported error)
> but with replace=T produced,
>
> > table(xlrmN1)/400
>
> xlrmN1
>      0      1      2
> 0.0200 0.9225 0.0575
>
> which is not exactly the sample with the probabilities provided
> (0.02,0.93,0.05)
>
> Now let's return to the concept of replace=F and replace=T.
> When I ask "sample" to select a sample of 400 from a vector of 3 with NO
> replacement, I would think the following a). create a very large sample
> from 0, 1, and 2. b). From this large sample, based on the prob vector
> select without replacement. c). As result I expect the probability of
> selected sample to be exactly the same with the prob vector (As in Splus7)
>
> When I ask "sample" to select a sample of 400 from a vector of 3 with
> replacement, I would think the following a). create a very large sample
> from 0, 1, and 2. b). From this large sample, based on the prob vector
> select with replacement, which means some of the previous selected 0, 1, 2
> can be selected again. c). As result I expect the probability of selected
> sample to be NOT exactly the same with the prob vector (As in Splus7 and
> R).
>
> So there are two conclusions: "sample" in R is not working correct, OR I am
> missing some precision as a rounding error to produce
>
> prob=c(0.02 ,0.93 ,0.05 ).
> Am I misunderstanding the "sample" function in R?
>
> Any suggestions are appreciated.
> TIA,
>
> Aldi
>
> Aldi Kraja wrote:
> >Hi,
> >In Splus7 this statement
> >xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 ))
> >worked fine, but in R the interpreter reports that the length of the
> >vector to chose c(0,1,2) is shorter than the size of many times I want
> >to be selected from the vector c(0,1,2).
> >Any good reason?
> >See below the error.
> >
> > > xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 ))
> >
> >Error in sample(length(x), size, replace, prob) :
> >        cannot take a sample larger than the population
> > when 'replace = FALSE'
> >Execution halted
> >
> >TIA,
> >
> >Aldi
> >
> >--
>
> --
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented, minimal,
> self-contained, reproducible code.



More information about the R-help mailing list