[R] Random # generator accuracy

(Ted Harding) Ted.Harding at manchester.ac.uk
Thu Jul 23 22:15:55 CEST 2009


OOPS! The result of a calculation below somehow got omitted!

  (325820+326140+325289+325098+325475+325916)/
    (174873+175398+174196+174445+173240+174110)
  # [1] 1.867351

to be compared (as at the end) with the ratio 1.867471 of the
expected number of "weight=2" to expected number of "weight=1".
Sorry.
Ted.

On 23-Jul-09 20:05:09, Ted Harding wrote:
> On 23-Jul-09 17:59:56, Jim Bouldin wrote:
>> Dan Nordlund wrote:
>> "It would be necessary to see the code for your 'brief test'
>> before anyone could meaningfully comment on your results.
>> But your results for a single test could have been a valid
>> "random" result."
>> 
>> I've re-created what I did below.  The problem appears to be
>> with the weighting process: the unweighted sample came out much
>> closer to the actual than the weighted sample (>1% error) did. 
>> Comments?
>> Jim
>> 
>>> x
>>  [1]  1  2  3  4  5  6  7  8  9 10 11 12
>>> weights
>>  [1] 1 1 1 1 1 1 2 2 2 2 2 2
>> 
>>> a = mean(replicate(1000000,(sample(x, 3, prob = weights))));a  # (1
>> million samples from x, of size 3, weighted by "weights"; the mean
>> should
>> be 7.50)
>> [1] 7.406977
>>> 7.406977/7.5
>> [1] 0.987597
>> 
>>> b = mean(replicate(1000000,(sample(x, 3))));b  # (1 million samples
>>> from
>> x, of size 3, not weighted this time; the mean should be 6.50)
>> [1] 6.501477
>>> 6.501477/6.5
>> [1] 1.000227
>> 
>> Jim Bouldin, PhD
> 
> To obtain the result you expected, you would need to explicitly
> specify "replace=TRUE", since the default for "replace" is FALSE.
> (though probably what you really intended was sampling without
> replacement).
> 
> Read carefully what is said about "prob" in '?sample' -- when
> replace=FALSE, the probability of inclusion of an element is not
> proportional to its weight in 'prob'.
> 
> The reason is that elements with higher weights are more likely
> to be chosen early on. This then knocks that higher weight out
> of the contest, making it more likely that elements with smaller
> weights will be sampled subsequently. Hence the mean of the sample
> will be biased slightly downwards, relative to the weighted mean
> of the values in x.
> 
>   table(replicate(1000000,(sample(x, 3))))
>   #      1      2      3      4      5      6
>   # 250235 250743 249603 250561 249828 249777
>   #      7      8      9     10     11     12
>   # 249780 250478 249591 249182 249625 250597
> 
> (so all nice equal frequencies)
> 
>   table(replicate(1000000,(sample(x, 3,prob=weights))))
>   #      1      2      3      4      5      6
>   # 174873 175398 174196 174445 173240 174110
>   #      7      8      9     10     11     12
>   # 325820 326140 325289 325098 325475 325916
> 
> Note that the frequencies of the values with weight=2 are a bit
> less than twice the frequencies of the values with weight=1:
> 
>   (325820+326140+325289+325098+325475+325916)/
>     (174873+175398+174196+174445+173240+174110)
>   # [1] 
> 
> 
> In fact this is fairly easily caluclated. The possible combinations
> (in order of sampling) of the two weights, with their probabilities,
> are:
>                                      1s  2s
> -------------------------------------------
> 1 1 1   P =  6/18 *  5/17 *  4/16    3   0
> 1 1 2   P =  6/18 *  5/17 * 12/16    2   1
> 1 2 1   P =  6/18 * 12/17 *  5/15    2   1
> 1 2 2   P =  6/18 * 12/17 * 10/15    1   2
> 2 1 1   P = 12/18 *  6/16 *  5/15    2   1
> 2 1 2   P = 12/18 *  6/16 * 10/15    1   2
> 2 2 1   P = 12/18 * 10/16 *  6/14    1   2
> 2 2 2   P = 12/18 * 10/16 *  8/14    0   3
> 
> So the expected number of "weight=1" in the sample is
> 
>   3*(6/18 *  5/17 *  4/16)  + 2*(6/18 *  5/17 * 12/16) +
>   2*(6/18 * 12/17 *  5/15)  + 1*(6/18 * 12/17 * 10/15) +
>   2*(12/18 *  6/16 *  5/15) + 1*(12/18 *  6/16 * 10/15) +
>   1*(12/18 * 10/16 *  6/14) + 0
>   = 1.046218
> 
> Hence the expected number of "weight=2" in the sample is
> 
>   3 - 1.046218 = 1.953782
> 
> and their ratio 1.953782/1.046218 = 1.867471
> 
> Compare this with the value 1.867351 (above) obtained by simulation!
> 
> Ted.
> 
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 23-Jul-09                                       Time: 21:05:07
> ------------------------------ XFMail ------------------------------

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 23-Jul-09                                       Time: 21:15:52
------------------------------ XFMail ------------------------------




More information about the R-help mailing list