[R] dreaded p-val for d^2 of a glm / gam

Spencer Graves spencer.graves at pdf.com
Fri Mar 28 00:38:12 CET 2008


      I assume you mean 'deviance', not 'squared deviance';  if the 
latter, then I have no idea. 

      If the former, then a short and fairly quick answer to your 
question is that 2*log(likelihood ratio) for nested hypotheses is 
approximately chi-square with numbers of degrees of freedom = the number 
of parameters in the larger model fixed to get the smaller model, under 
standard regularity conditions, the most important of which is that the 
maximum likelihood is not at a boundary. 

      For specificity, consider the following modification of the first 
example in the 'glm' help page: 

     counts <- c(18,17,15,20,10,20,25,13,12)
     outcome <- gl(3,1,9)
     treatment <- gl(3,3)
     glm.D93 <- glm(counts ~ outcome + treatment, family=poisson())
     glm.D93t <- glm(counts ~ treatment, family=poisson())
     anova(glm.D93t, glm.D93, test="Chisq")

      The p-value is not printed by default, because some people would 
rather NOT give an answer than give an answer that might not be very 
accurate in the cases where this chi-square approximation is not very 
good.  To check that, you could do a Monte Carlo, refit the model with, 
say, 1000 random permutations of your response variable, collect 
anova(glm.D93t, glm.D93)[2, "Deviance"] in a vector, and then find out 
how extreme the deviance you actually got is relative to this 
permutation distribution. 

      Hope this helps. 
      Spencer Graves
p.s.  Regarding your 'dread', please see fortune("children")

Monica Pisica wrote:
> OK,
>
> I really dread to ask that .... much more that I know some discussion about p-values and if they are relevant for regressions were already on the list. I know to get p-val of regression coefficients - this is not a problem. But unfortunately one editor of a journal where i would like to publish some results insists in giving p-values for the squared deviance i get out from different glm and gam models. I came up with this solution, but sincerely i would like to get yours'all opinion on the matter.
>
> p1.glm <- glm(count ~be+ch+crr+home,   family = 'poisson')
>
> # count - is count of species (vegetation)
> # be, ch, crr, home - different lidar metrics
>
> # calculating d^2
> d2.p1 <- round((p1.glm[[12]]-p1.glm[[10]])/p1.glm[[12]],4)
> d2.p1
> 0.6705
>
> # calculating f statistics with N = 148 and n=4; f = (N-n-1)/(N-1)(1-d^2)
> f <-  (148-4-1)/(147*(1-0.6705))
> f
> [1] 2.952319
>
> #calculating p-value
> pval.glm <- 1-pf(f, 147,143)
> pval.glm
> [1] 1.135693e-10
>
> So, what do you think? Is this acceptable if i really have to give a p-value for the deviance squared? If it is i think i will transform everything in a fuction ....
>
> Thanks,
>
> Monica
> _________________________________________________________________
> Windows Live Hotmail is giving away Zunes.
>
> M_Mobile_Zune_V3
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list