[R] Parameterization puzzle

Murray Jorgensen maj at waikato.ac.nz
Sat Jul 22 00:57:56 CEST 2006


Thanks to Brian and Berwin with there help. I faced a double problem in 
that I not only wanted to fit the model but I also wanted to do so in 
such a way that it would seem natural for a classroom example.

The moral seems to be that I should do the orthogonal polynomial stuff 
outside the model formula. Here then is my solution:

pyears <- scan()
18793 52407 10673 43248 5710 28612 2585 12663 1462 5317

deaths <- scan()
2 32 12 104 28 206 28 186 31 102

l <- log(pyears)
Smoke <- gl(2,1,10,labels=c("No","Yes"))
Age <- gl(5,2,10,labels=c("35-44","45-54","55-64","65-74","75-84"),
            ordered=TRUE)
mod1.glm <- glm(deaths ~ Age * Smoke + offset(l),family=poisson)
summary(mod1.glm)
age <- as.numeric(Age)
age1 <- poly(age,2)[,1]
age2 <- poly(age,2)[,2]
mod2.glm <- aso1.glm <- glm(deaths ~ age1 + age2 + Smoke +
                       age1:Smoke + offset(l),family=poisson)
summary(mod2.glm)

The final summary then comes out looking like this:

               Estimate Std. Error z value Pr(>|z|)
(Intercept)    -5.8368     0.1213 -48.103  < 2e-16 ***
age1            5.3238     0.4129  12.893  < 2e-16 ***
age2           -1.0460     0.1448  -7.223 5.08e-13 ***
SmokeYes        0.5183     0.1262   4.106 4.02e-05 ***
age1:SmokeYes  -1.3755     0.4340  -3.169  0.00153 **


which is just what I wanted.

Cheers,  Murray Jorgensen

Prof Brian D Ripley wrote:
> On Fri, 21 Jul 2006, Berwin A Turlach wrote:
> 
>> G'day all,
>>
>>>>>>> "BDR" == Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:
>>
>>    BDR> R does not know that poly(age,2) and poly(age,1) are linearly
>>    BDR> dependent.
>> Indeed, I also thought that this is the reason of the problem.
>>
>>    BDR> (And indeed they only are for some functions 'poly'.)
>> I am surprised about this.  Should probably read the help page of
>> 'poly' once more and more carefully.
> 
> My point was perhaps simpler: if you or I or Murray had a function 
> poly() in our workspace, that one would be found, I think.  (I have not 
> checked the ramifications of namespaces here, but I believe that would 
> be the intention, that formulae are evaluated in their defining 
> environment.)  So omly when the model matrix is set up could the linear 
> dependence be known (and there is nothing in the system poly() to tell 
> model.matrix).
> 
> What is sometimes called extrinsic aliasing is left to the fitting 
> function, which seems to be to do a sensible job provided the main 
> effect is in the model.  Indeed, including interactions without main 
> effects (as Murray did) often makes the results hard to interpret.
> 
>>    BDR> I cannot reproduce your example ('l' is missing), [...]
>> My guess is that 'l' is 'pyears'.  At least, I worked under that
>> assumption.
> 
> Apparently l = log(pyears), which was my uncertain guess.
> 
>> Interestingly, on my machine (using R 2.3.1, 2.2.1 and 2.1.1) I cannot
>> fit any of the Poisson GLM that Murray tried.  I always get the error
>> message:
>>
>> Error: no valid set of coefficients has been found: please supply 
>> starting values
> 
> Related to the offset, I believe.
> 
>>
>> But I have to investigate this further.  I can fit binomial models
>> that give me similar answers.
>>
>>    BDR> [...] but perhaps
>>    BDR> glm(deaths ~ poly(age,2) + poly(age,1)*Smoke + offset(l),
>>    BDR> poisson)
>>    BDR> was your intention?
>> In this parameterisation a 'poly(age,1)' term will appear among the
>> coefficients with an estimated value of NA since it is aliased with
>> 'poly(age, 2)1'.  So I don't believe that this was Murray's intention.
>>
>> The only suggestion I can come up with is:
>>
>>> summary(glm(cbind(deaths, l-deaths) ~ age*Smoke+I(age^2), 
>>> family=binomial))
>>
>> [...]
>>
>> Coefficients:
>>              Estimate Std. Error z value Pr(>|z|)
>> (Intercept)  -10.79895    0.45149 -23.918  < 2e-16 ***
>> age            2.37892    0.20877  11.395  < 2e-16 ***
>> SmokeYes       1.44573    0.37347   3.871 0.000108 ***
>> I(age^2)      -0.19706    0.02749  -7.168  7.6e-13 ***
>> age:SmokeYes  -0.30850    0.09756  -3.162 0.001566 **
>>
>> [...]
>>
>> Which doesn't use orthogonal polynomials anymore.  But I don't see how
>> you can fit the model that Murray want to fit using orthogonal
>> polynomials given the way R's model language operates.
>>
>> So I guess the Poisson GLM that Murray wants to fit is:
>>
>>        glm(deaths~ age*Smoke+I(age^2)+offset(l), family=poisson)
>>
>> Cheers,
>>
>>        Berwin
>>
>> ========================== Full address ============================
>> Berwin A Turlach                      Tel.: +61 (8) 6488 3338 (secr)
>> School of Mathematics and Statistics        +61 (8) 6488 3383 (self)
>> The University of Western Australia   FAX : +61 (8) 6488 1028
>> 35 Stirling Highway
>> Crawley WA 6009                e-mail: berwin at maths.uwa.edu.au
>> Australia                        http://www.maths.uwa.edu.au/~berwin
>>
>>
> 

-- 
Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: maj at waikato.ac.nz                                Fax 7 838 4155
Phone  +64 7 838 4773 wk    Home +64 7 825 0441    Mobile 021 1395 862



More information about the R-help mailing list