[Rd] PR#2894

Mon Nov 24 19:04:27 MET 2003

I came across the message below and decided to respond. I don't use R but I do use other FSF products and I am a believer in making high quality software freely available, particularly for basic mathematical and statistical functions.

I can thorougly recommend TOMS Algorithm 708 but I believe there are a couple of problems from you point of view. Firstly there may be restrictions on how it may be used and secondly it doesn't come with a corresponding inverse function. You may, of course, feel your qbeta routine would be fine with a better pbeta.

CDFLIB includes this algorithm but comes with the warning "However, code from ACM publications is subject to the ACM policy (below)". The inversion function in CDFLIB is not built to the same high standards as A. H. Morris's code for Algorithm 708. As a consequence it is slower than it should be, doesn't work for probabilities samller than 1e-50 and if the UCLA calculators http://calculators.stat.ucla.edu/ use the CDFLIB code (as they claim) then it has some peculiar bugs {UCLA's qbeta(0.9000872417014739,0.00000631,0.0000595) returns 6.67492405793E-09. It should be approx 3e-308}. However, if you can get around the "licencing" problems then it would be a low risk solution to your problems. 

As an alternative, I offer software of mine. Code for a number of distributions, including the beta-distribution, is available at http://members.aol.com/iandjmsmith/EXAMPLES.HTM. I realise it would be a huge risk using "unknown" software but if you try the code for the beta distribution, you will find it reasonably fast, accurate and available to use any way you wish. 

Having downloaded the R source, I also noticed the comment /*___ FIXME ___:  This takes forever (or ends wrongly) when (one or) both p & q  are huge */. My code uses an asymptotic expansion for the incomplete beta integral and consequently the runtime has an upper limit. 

I could go on and on about the code but there's not much point unless you are interested in using it. The code is in Javascript which is relatively painless to translate to C, though it would obviously require additional effort to make it consistent with the rest of R. Should you be interested, I would be willing to help or sit back and leave you to it.

Ian Smith

Message-id: <200309221347.h8MDlUJH021443 at pubhealth.ku.dk>

>Date: Fri, 2 May 2003 10:03:23 -0400 (EDT) 
From: Morten Welinder <welinder at rentec.com <mailto:welinder at rentec.com?Subject=Re:%20[Rd]%20PR>> 
>To: p.dalgaard at biostat.ku.dk <mailto:p.dalgaard at biostat.ku.dk?Subject=Re:%20[Rd]%20PR> 
>CC: r-devel at stat.math.ethz.ch <mailto:r-devel at stat.math.ethz.ch?Subject=Re:%20[Rd]%20PR>, R-bugs at biostat.ku.dk <mailto:R-bugs at biostat.ku.dk?Subject=Re:%20[Rd]%20PR> 
>Subject: Re: [Rd] qbeta hang (PR#2894) 
> 
>Ok, I can confirm that it does not, in fact, loop forever. Just a close 
>approximation. 
... 
>There are lots of other places that worry me with respect to cancellation 
>errors, for example 
> 
> r = 1 - pp; 
> t = 1 - qq; 
These can also be removed by changing qbeta.c:156-157 (R-1.7.1) 
to 
y = (y-a) * xinbta * (1.0-xinbta) * 
exp (logbeta - pp * log(xinbta) - qq * log1p(-xinbta)); 
I don't understand why you have qbeta.c:167 
if (fabs(y)<=acu) goto L_converged 
which will fail for certain values of p & q, when x is close to zero as 
the beta density tends to infinity. Why not use an exit condition based on 
how far away from 'a' you are? 
qbeta.c:174 (prevent excess precision) 
Might this not just get optimized out? I doubt modern compilers respect 
the volatile keyword. You should probably replace this with a safe 
comparison. if(fabs(tx-xinbta)<=DBL_EPSILON*fabs(xinbta))... 
** This should be checked with someone familiar with floating point 
arithmetic; I'm not and may have got the correct expression wrong. 
The 'infinite loop' is due to the speed of the pbeta routine, rather than 
the initial approximation from the qbeta. There is another algorithm 
available for pbeta (Algorithm 708, CACM 18. 360--373, 1992) which may be 
of use here. This continued fraction approximation is recommended by 
Numerical Recipes over the power series approximation (make of that what 
you will!) 
TimM.