[Rd] phyper accuracy and efficiency (PR#6772)

terra at gnome.org terra at gnome.org
Thu Apr 15 18:06:37 CEST 2004


Full_Name: Morten Welinder
Version: snapshot
OS: 
Submission from: (NULL) (65.213.85.218)


Time to kick phyper's tires...

The current version has very serious cancellation issues.  For example, if you
ask
for a small right-tail you are likely to get total cancellation.  For example
phyper(59, 150, 150, 60, FALSE, FALSE) gives 6.372680161e-14.  The right answer
is
dhyper(0, 150, 150, 60, FALSE) which is 5.111204798e-22.

phyper is also really slow for large arguments.

Therefore, I suggest using the code below.  This is a sniplet from Gnumeric, so
you'll have to s/gnm_float/double/ and s/gnum//, etc.  The code isn't perfect.
In fact, if  i*(NR+NB)  is close to   n*NR, then this code can take a while.
Not longer than the old code, though.





/*
 * New phyper implementation.  Copyright 2004 Morten Welinder.
 * Distributed under the GNU General Public License.
 *
 * Thanks to Ian Smith for ideas.
 */
/*
 * Calculate
 *
 *          phyper (i, NR, NB, n, TRUE, FALSE)
 *   [log]  ----------------------------------
 *             dhyper (i, NR, NB, n, FALSE)
 *
 * without actually calling phyper.  This assumes that
 *
 *     i * (NR + NB) <= n * NR
 *
 */
static gnm_float
pdhyper (gnm_float i, gnm_float NR, gnm_float NB, gnm_float n, gboolean log_p)
{
  gnm_float sum = 0;
  gnm_float term = 1;

  while (i > 0 && term >= GNUM_EPSILON * sum) {
	  term *= i * (NB - n + i) / (n + 1 - i) / (NR + 1 - i);
	  sum += term;
	  i--;
  }

  return log_p ? log1pgnum (sum) : 1 + sum;
}


gnm_float
phyper (gnm_float i, gnm_float NR, gnm_float NB, gnm_float n, int lower_tail,
int log_p)
{
	gnm_float d, pd;

#ifdef IEEE_754
	if (isnangnum (i) || isnangnum (NR) || isnangnum (NB) || isnangnum (n))
		return i + NR + NB + n;
#endif

	i = floorgnum (i + 1e-7);
	NR = floorgnum (NR + 0.5);
	NB = floorgnum (NB + 0.5);
	n = floorgnum (n + 0.5);

	if (NR < 0 || NB < 0 || !finitegnum (NR + NB) || n < 0 || n > NR + NB)
		ML_ERR_return_NAN;

	if (i * (NR + NB) > n * NR) {
		/* Swap tails.  */
		gnm_float oldNB = NB;
		NB = NR;
		NR = oldNB;
		i = n - i - 1;
		lower_tail = !lower_tail;
	}

	if (i < 0)
		return R_DT_0;

	d = dhyper (i, NR, NB, n, log_p);
	pd = pdhyper (i, NR, NB, n, log_p);

	return log_p ? R_DT_log (d + pd) : R_D_Lval (d * pd);
}



More information about the R-devel mailing list