[Rd] proposed pbirthday fix

Martin Maechler maechler at stat.math.ethz.ch
Mon Jan 23 19:01:25 CET 2006


>>>>> "MM" == Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>     on Mon, 23 Jan 2006 11:52:55 +0100 writes:

>>>>> "ken" == ken knoblauch <knoblauch at lyon.inserm.fr>
>>>>>     on Mon, 23 Jan 2006 09:43:28 +0100 writes:

    ken> Actually, since NaN's are also detected in na.action
    ken> operations, a simpler fix might just be to use the
    ken> na.rm = TRUE option of min

    ken> upper <- min(n^k/(c^(k - 1)), 1, na.rm = TRUE)

    MM> Well, I liked your first fix better -- thank you for it! --
    MM> since it's always good practice to formulate such as to avoid
    MM> overflow when possible. 
    MM> All things considered, I think I'd go for

    MM> upper <- min( exp(k * log(n) - (k-1) * log(c)), 1, na.rm = TRUE)

    MM> Martin 

    Ken> Recent news articles concerning an article from The
    Ken> Lancet with fabricated data indicate that in the sample
    Ken> containing some 900 or so patients, more than 200 had the
    Ken> same birthday.  I was curious and tried out the p and q
    Ken> birthday functions but pbirthday could not handle 250
    Ken> coincidences with n = 1000.  The calculation of upper
    Ken> prior to using uniroot produces NaN,

    Ken> upper<-min(n^k/(c^(k-1)),1)

    Ken> I was able to get it to work by using logs, however, as
    Ken> in the following version

    >>> function(n, classes = 365, coincident = 2){
	..................

    >>> upper <- min( exp( k * log(n) - (k-1) * log(c) ), 1 )
    >>> nmin <- uniroot(f, c(lower, upper), tol = eps)
    >>> nmin$root
    >>> }

Well, now after inspection, I think ``get it to work''
is a bit of an exaggeration, at least for a purist like me
(some famous fortune teller once guessed it may be because I'm ... Swiss)
who doesn't like to lose precision in probability computations
unnecessarily. One can do much better:

The version of [pq]birthday() I've just committed to R-devel *) now gives

> sapply(c(20,50,100,200), function(k) pbirthday(1000, coincident= k))
[1]  8.596245e-08  9.252349e-41 2.395639e-112 1.758236e-285

whereas the 'na.rm=TRUE' fix  would simply give

[1] 8.596245e-08 0.000000e+00 0.000000e+00 0.000000e+00

--
Martin Maechler, ETH Zurich

*) peek at https://svn.r-project.org/R/trunk/src/library/stats/R/pbirthday.R



More information about the R-devel mailing list