[R] R: numerical integration problems

Duncan Murdoch murdoch at stats.uwo.ca
Fri Jan 2 13:51:54 CET 2009


On 02/01/2009 6:37 AM, Allan Clark wrote:
> hello all
>  
> happy new year and hope you r having a good holiday.
>  
>  
> i would like to calculate the expectation of a particular random variable and would like to approximate it using a number of the functions contained in R. decided to do some experimentation on a trivial example.
>  
> example
> ========
>  
> suppose x(i)~N(0,s2) where s2 = the variance
> the prior for s2 = p(s2)~IG(a,b)
> so the posterior is IG(.5*n+a, b+.5*sum(x^2) )
> and the posterior expectation of s2 = (b+sum(x^2))/(.5*n+a-1)
>  
> I want to calculate the value of this expectation using integrals. 
> 
> ie
>  
> let L(X|s2) = the likelihood function
> so E(s2|X) = (integral(s2*L(X|s2)*p(s2)))/(integral(L(X|s2)*p(s2))) . the bounds on both of the integrals is (0,Inf)
> =========================================================================
>  
>  
> three methods are used to calculate E(s2|X) for different sample sizes:
>  
> Method 1
> =========
> (integral(s2*L(X|s2)*p(s2)))/(integral(L(X|s2)*p(s2))) where the integrals are calculated using the integrate function
>  
> Method 2
> =========
> transform s2 to tau (from (0,Inf) to (-1,1))
> f(s2) = tau = 4*arctan(s2)/pi-1
>  
> so 
>  
> E(s2|X) = (integral(jac*finv(s2)*L(X|finv(s2))*p(finv(s2))))/(integral(L(X|finv(s2))*p(finv(s2)))) .  the bounds on both integrals is (-1,1)
>  
> jac = the jacobian and finv is the inverse function of f(s2)
>  
> once again the integrals are calculated using the intergrate function
>  
> Method 3
> =========
> legendre gaussian quadriture is used to calculate each of the integrals using Method 2. 
>  
>  
> Some results and comments are below.
> method 3 seems to give the correct solution for n<=343
> differences between the different methods can be detected from as early as n=3 for method 1
>  
> IS THERE A WAY TO MODIFY THE PROBLEM SO THAT THE EXPECTATION CAN BE CALCULATED FOR LARGE n without using MC or MCMC methods?

For large n, the density is highly peaked near its mode.   The 
integrate() function doesn't work well on functions like that, but you 
can improve it by breaking the range up into three parts:  the centre, 
to the left, and to the right.  If you define the centre so that almost 
all of the mass will be there, then it doesn't matter if the other two 
integrals are inaccurate.  I'd suggest using the MLE plus or minus 5 
standard errors.  This should cover the mode unless the prior is very 
informative.

You'll still have a problem of overflow when evaluating it.  To fix 
that, just rescale the density, so that its max is 1.  (Since the 
density occurs in both numerator and denominator, this won't affect the 
final answer).  In general it might be hard to find the max of the 
density, but it's probably good enough to find the max of the 
likelihood, or some other rough approximation, e.g. the density at the 
MLE.  The value 1 isn't important, all you need is an upper bound that 
doesn't cause overflow.

Duncan Murdoch

>  
> P(1)
> $correct
> [1] 2.201234
>  
> $App.2
> [1] 2.201234
>  
> $App.trans
> [1] 2.201234
> $App.gq
> [1] 2.201234
> #=========================================
>  
> P(100)
> $correct
> [1] 18.85535
>  
> $App.2
> [1] 15.65814
>  
> $App.trans
> [1] 18.85028
> $App.gq
> [1] 18.85535
> #=========================================
>  
> 
> P(300)
> $correct
> [1] 22.59924
>  
> $App.2
> [1] NaN
>  
> $App.trans
> [1] 18.85405
> $App.gq
> [1] 22.59924
> #=========================================
>  
> P(500)
> Error in integrate(num., 0, Inf) : non-finite function value
>  
>  
>  
> The code is included below:
>  
> 
> P=function(n=1,howmany=500)
> {
>   #you do need to have stamod
>   #howmany is used to set the number of points used for the gaussian quadriture
> 
>   set.seed(1)
>   sigma=5
>   x=rnorm(n)*sigma
>   a=5
>   b=5
>   
>   #the correct expectation
>   #===========================================================
>   correct=(b+sum(x^2)*.5)/(.5*n+a-1)
>   
>  
>   #Method 1
>   #===========================================================
>   num.=function(s2)
>   {
>    L=((2*pi*s2)^(-n*.5))*exp(-.5*sum(x^2)/s2)
>    P=(s2^(-(a+1)))*exp(-b/s2)*(b^a)/gamma(a)
>    return(s2*L*P)
>   }
>   den.=function(s2)
>   {
>    L=((2*pi*s2)^(-n*.5))*exp(-.5*sum(x^2)/s2)
>    P=(s2^(-(a+1)))*exp(-b/s2)*(b^a)/gamma(a)
>    return(L*P)
>   }
>   App.2=integrate(num.,0,Inf)$value/integrate(den.,0,Inf)$value
>   
>  
>   #Method 2
>   #===========================================================
>   num.t=function(s2.)
>   {
>     s2=tan(.25*pi*(1+s2.))
>     jac=.25*pi*(((tan(.25*pi*(1+s2.)))^2)+1)
>     num.tv=jac*exp(-(a+.5*n)*log(s2)-(b+sum(x^2)*.5)/s2)
>     return(num.tv)
>   }
>   den.t=function(s2.)
>   {
>     s2=tan(.25*pi*(1+s2.))
>     jac=.25*pi*(((tan(.25*pi*(1+s2.)))^2)+1)
>     den.tv=jac*exp(-(a+.5*n+1)*log(s2)-(b+sum(x^2)*.5)/s2)
>     return(den.tv)
>   }
>   App.trans=integrate(num.t,-1,1)$value/ integrate(den.t,-1,1)$value
>  
> 
>   #Method 3
>   #===========================================================
>   library(statmod)
>   ad.=gauss.quad(n=howmany,kind="legendre",alpha=-1,beta=-1)
>   n.=ad.$nodes
>   w.=ad.$weights                                        
>   App.gq=sum(w.*num.t(n.))/sum(w.*den.t(n.))
>   
>   list(correct=correct,App.2=App.2,App.trans=App.trans, App.gq=App.gq)
> }
>  
>  
> Allan Clark
> ========
> Lecturer in Statistical Sciences Department
> University of Cape Town
> 7701 Rondebosch
> South Africa
> TEL (Office): +27-21-650-3228
> FAX: +27-21-650-4773
> http://web.uct.ac.za/depts/stats/aclark.htm 
>  
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list