[R] Understanding output of summary(glm(...))

Wed Aug 20 09:03:48 CEST 2008

The 'Std. Error' values listed in the coefficients table of the summary
have nothing to do with the sub-class standard deviations.  They are the
standard errors associated with the estimates of the class means (the
way you have fitted the model) and as the design has equal replication
and the estimated standard errors are based on the pooled estimate of
variance from all samples, they are equal.  That's why.

Your second 'example' was incomplete and I couldn't follow it, but the
answer is almost certainly "hell no!".

Finally, a question for you.  Why do you use glm(...) when all you are
doing is fitting linear models?  Either lm(...) or aov(...) would have
been much more sensible.  

Bill Venables
http://www.cmis.csiro.au/bill.venables/ 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Daren Tan
Sent: Wednesday, 20 August 2008 4:37 PM
To: r-help at stat.math.ethz.ch
Subject: [R] Understanding output of summary(glm(...))

Simple example of 5 groups of 4 replicates.

>set.seed(5)

>tmp <- rnorm(20)

>gp <- as.factor(rep(1:5,each=4))

>summary(glm(tmp ~ -1 + gp, data=data.frame(tmp, gp)))$coefficients
Estimate Std. Error       t value  Pr(>|t|)gp1 -0.1604613084  0.4899868
-0.3274809061 0.7478301gp2  0.0002487984  0.4899868  0.0005077655
0.9996016gp3  0.0695463698  0.4899868  0.1419352018 0.8890200gp4
-0.6121682841  0.4899868 -1.2493567852 0.2306791gp5 -0.6999545014
0.4899868 -1.4285171713 0.1736348

>m <- data.frame(tmp, gp)
>sapply(gp, function(x) sd(m[m[,"gp"]==x,1])) [1] 1.169284 1.169284
1.169284 1.169284 1.142974 1.142974 1.142974 1.142974 [9] 0.862423
0.862423 0.862423 0.862423 0.535740 0.535740 0.535740 0.535740[17]
1.047538 1.047538 1.047538 1.047538
Why doesn't the standard deviation of each group correlates with the Pr
e.g., gp = 4 has the smallest sd of 0.535740, but its Pr is not the
lowest (i.e., only 0.23 vs 0.1736 of gp = 5). 

Another example with new tmp1

>tmp1
 [1]  9.577969  9.310792  9.666767  9.610164 10.181692 10.155899
10.025943 [8]  9.971243 10.177766  9.265793  9.415818 10.099874
10.238829  9.575591[15]  9.560879  9.617891  9.617891 10.158160
10.592377 10.068443

>summary(glm(tmp1 ~ -1 + age,
data=data.frame(as.vector(as.matrix(tmp1)), age)))$coefficients
Estimate Std. Error  t value     Pr(>|t|)age1  9.541423  0.1611603
59.20456 3.380085e-19age2 10.083694  0.1611603 62.56935 1.479781e-19age3
9.739813  0.1611603 60.43557 2.485380e-19age4  9.748297  0.1611603
60.48821 2.453251e-19age5 10.109218  0.1611603 62.72773 1.424913e-19
m1 <- data.frame(tmp1, gp)

>sapply(age, function(x) sd(m1[m1[,"age"]==x,1])) [1] 0.1580745
0.1580745 0.1580745 0.1580745 0.1013207 0.1013207 0.1013207 [8]
0.1013207 0.4658736 0.4658736 0.4658736 0.4658736 0.3279128
0.3279128[15] 0.3279128 0.3279128 0.3995426 0.3995426 0.3995426
0.3995426

Can I conclude from the Pr of summary that tmp1 are of better "quality"
than tmp, given that its Pr. values are signficantly smaller ? 

_________________________________________________________________

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.