[R] glm() function not finding the maximum
Richard Nixon
richard.nixon at mrc-bsu.cam.ac.uk
Mon Apr 22 17:51:21 CEST 2002
Hello,
I have found a problem with using the glm function with a gamma
family.
I have a vector of data, assumed to be generated by a gamma distribution.
The parameters of this gamma distribution are estimated in two ways (i)
using the glm() function, (ii) "by hand", using the optim() function.
I find that the -2*likelihood at the maximum found by (i) is substantially
larger than that found by (ii), i.e. the glm() function is not finding the
maximum.
This is some what of a pathological example, as the data set is highly
skewed and contains a couple of outliers.
I've tested this in S+ and the same problem is there too.
Is this cause for concern, or is my data set just a "nasty" one to deal
with?
I am really impressed with the optim() function. Indeed, it is the reason
why I switched to R from Splus. The Splus analogue was very slow, and
didn't find the maximum.
The data set and code for the two methods of estimation are included
below. I don't think I am making a mistake here. Sorry if I have.
Thanks
Richard
> gamma1(data) #uses the glm() function
$loglik
[1] 875.4274
$par
[1] 9.572403e-02 4.345771e+03
> gamma2(data) #"by hand" using optim()
$loglik
[1] 793.3913
$par
[1] 0.518145 802.854297
#Data set
data_c(51.47, 210.19, 49.55, 61.93, 60.61, 744.57, 338.59, 133.93,
191.57, 111.43, 432.83, 185.23, 155.61, 84.72, 120.2, 15.33,
77.05, 115.77, 25.23, 657.94, 108.39, 61.08, 142.42, 87.86, 272.87,
213.78, 65.23, 102.45, 58.16, 176.58, 76.58, 434.12, 362.35,
102.53, 103.6, 25.23, 97.19, 88.52, 118.55, 151.9, 2.7, 156.41,
21.79, 272.27, 23.16, 32.07, 6325.23, 92.37, 8340.04, 51.08,
55.59, 94.08, 69.98, 554.13, 104.88, 170.15, 945.1, 143.52)
#Fits data to a gamma distribution using glm()
gamma1_function(data){
n_length(data)
m_summary(glm(data~1, family=Gamma(link=identity)))
shape_1/as.numeric(m$disp)
scale_as.numeric(m$coeff[1]*m$disp)
dev.res_-2*log(dgamma(data,shape=shape,scale=scale))
loglik_sum(dev.res) #actually -2 * log like
list(loglik=loglik,par=c(shape,scale))
}
#Fits data to a gamma distribution "by hand" using optim()
gamma2_function(data){
n_length(data)
m_summary(glm(data~1, family=Gamma))
shape_1/as.numeric(m$disp)
#L = -Log likelihood
L_function(x){-(-n*log(gamma(x[1]))+n*x[1]*log(x[1]/x[2])+(x[1]-1)*sum(log(data))-x[1]/x[2]
*sum(data))}
start_c(shape, mean(data))
parscale_start
fit_optim(start,L,method="L-BFGS-B",lower=c(shape/100,0),
upper=c(NA,NA),control=list(parscale=parscale))
shape_fit$par[1]
mu_fit$par[2]
scale_mu/shape
dev.res_-2*log(dgamma(data,shape=shape,scale=scale))
loglik_sum(dev.res) #actually -2 * log like
list(loglik=loglik,par=c(shape,scale))
}
--
Dr. Richard Nixon
MRC Biostatistics Unit, Institute of Public Health,
Robinson Way, Cambridge, CB2 2SR
http://www.mrc-bsu.cam.ac.uk/personal/richard
Tel: +44 (0)1223 330382, Fax: +44 (0)1223 330388
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list