[Rd] [r-devel] integrate over an infinite region produces wrong results depending on scaling

Andreï V. Kostyrka @ndre|@ko@tyrk@ @end|ng |rom un|@|u
Fri Apr 12 16:52:11 CEST 2019


Dear all,

This is the first time I am posting to the r-devel list. On 
StackOverflow, they suggested that the strange behaviour of integrate() 
was more bug-like. I am providing a short version of the question (full 
one with plots: https://stackoverflow.com/q/55639401).

Suppose one wants integrate a function that is just a product of two 
density functions (like gamma). The support of the random variable is 
(-Inf, 0]. The scale parameter of the distribution is quite small 
(around 0.01), so often, the standard integration routine would fail to 
integrate a function that is non-zero on a very small section of the 
negative line (like [-0.02, -0.01], where it takes huge values, and 
almost 0 everywhere else). R’s integrate would often return the machine 
epsilon as a result. So I stretch the function around the zero by an 
inverse of the scale parameter, compute the integral, and then divide it 
by the scale. Sometimes, this re-scaling also failed, so I did both if 
the first result was very small.

Today when integration of the rescaled function suddenly yielded a value 
of 1.5 instead of 3.5 (not even zero). The MWE is below.

cons <- -0.020374721416129591
sc <- 0.00271245601724757383
sh <- 5.704
f <- function(x, numstab = 1) dgamma(cons - x * numstab, shape = sh, 
scale = sc) * dgamma(-x * numstab, shape = sh, scale = sc) * numstab

curve(f, -0.06, 0, n = 501, main = "Unscaled f", bty = "n")
curve(f(x, sc), -0.06 / sc, 0, n = 501, main = "Scaled f", bty = "n")

sum(f(seq(-0.08, 0, 1e-6))) * 1e-6 #  Checking by summation: 3.575294
sum(f(seq(-30, 0, 1e-4), numstab = sc)) * 1e-4 # True value, 3.575294
str(integrate(f, -Inf, 0)) # Gives 3.575294
# $ value       : num 3.58
# $ abs.error   : num 1.71e-06
# $ subdivisions: int 10
str(integrate(f, -Inf, 0, numstab = sc))
# $ value       : num 1.5 # What?!
# $ abs.error   : num 0.000145 # What?!
# $ subdivisions: int 2

It stop at just two subdivisions! The problem is, I cannot try various 
stabilising multipliers for the function because I have to compute this 
integral thousands of times for thousands of parameter values on 
thousands of sample windows for hundreds on models, so even in the 
super-computer cluster, this takes weeks. Besides that, reducing the 
rel.tol just to 1e-5 or 1e-6, helped a bit, but I am not sure whether 
this guarantees success (and reducing it to 1e-7 slowed down the 
computations in some cases). And I have looked at the Fortran code of 
the quadrature just to see the integration rule, and was wondering.

How can I make sure that the integration routine will not produce such 
wrong results for such a function, and the integration will still be fast?

Yours sincerely,
Andreï V. Kostyrka



More information about the R-devel mailing list