[R] how to calculate this natural logarithm

Petr Savicky savicky at praha1.ff.cuni.cz
Sat Jan 8 13:08:47 CET 2011


On Sat, Jan 08, 2011 at 12:20:59AM +0800, zhaoxing731 wrote:
> Hello
> 
> 	I want to calculate natural logarithm of sum of combinations as follow: (R code)
> 
> 	{
> 		com_sum=choose(2000000,482)*choose(1000000,118)+choose(2000000,483)*choose(1000000,117)+...+choose(2000000,i)*choose(1000000,600-i)+...+choose(2000000,600)*choose(1000000,0)                  #calculate the sum
> 	   result=log(com_sum)                          	   #calculate the log of the sum
> 	}

Let me present a more detailed calculation. The natural logarithms
of the products above are

  i <- 482:600
  logx <- lchoose(2000000,i) + lchoose(1000000,600-i)

The logarithm of the largest of the products is

  maxlog <- max(logx)
  maxlog # [1] 5675.315

The ratios of the products divided by the largest of them may be computed as

  exp(logx - maxlog)
  # [1] 1.000000e+00 4.885522e-01 2.361712e-01 1.129583e-01 5.345075e-02
  # ...

Only a few of the ratios are not negligible, so the sum of the
products differs from the largest product by a small factor. This
factor may be estimated as

  sum(exp(logx - maxlog))
  # [1] 1.937370

The required logarithm differs from maxlog by the logarithm of this
factor. So, the result is 

  options(digits=15)
  maxlog + log(sum(exp(logx - maxlog)))
  # [1] 5675.97681976982

Computing the sum of products exactly using long integer arithmetic
and computing its logarithm with large precision produced

  5675.9768197698224041850302683544102271115821956713424401649...

which matches well the result obtained using double precision in R.

The above approach may be derived from the numerical part of the algorithm
suggested by Ted Harding for computing extremely small p-values in the Fisher Exact
test. There, we are computing a sum of small numbers, not large ones. However,
the problem is similar, since we cannot represent the summands themselves, but
we can represent their logarithms. In my opinion, P(a, b, c, d) is the largest
of the summands or close to the largest, so we calculate the ratios of the summands
to the largest of them or to a close approximation of the largest.

Petr Savicky.



More information about the R-help mailing list