[R] question about linear models.

Mon Apr 19 21:28:31 CEST 2004

This would make a good exam question!

First, look at the distribution of levels:

      B=0   B=1   B=2   B=3
A=0    6    --    --    --
A=1   --     4     3     2
A=2   --     2     3     4

And then look at the mean values within combinations of levels:

      B=0   B=1   B=2   B=3
A=0  1.15   --    --    --  | 1.15
A=1   --   1.81  1.85  1.52 | 1.76
A=2   --   2.31  2.52  2.13 | 2.30
----------------------------+------
     1.15  1.98  2.18  1.93 | 1.81
(Residual SE after fitting A+B = 0.38)

First, it is clear that (A=0) vs (A>0) is exactly associated
with (B=0) vs (B>0). Therefore any difference between means
for (A=0) vs (A>0) is fully confounded with (B=0) vs (B>0).
Clearly (from table of means) there *is* a difference here
(significant as it turns out), so fitting A alone will give
a significant result as will fitting B alone.

Further (table of means) the response increases almost linearly
with A (about 0.6/level), while it does not change much for
(B=1/2/3). So almost all if the variation with respect to B
is accounted for by the difference between (B=0) and (B>0)
which is totally confounded with A. Therefore, once you have
fitted A, fitting B as an additional variate will not change
the fit significantly.

However, if you fit B first followed by adding A, you first
(B fit) take out the difference between (B=0) vs (B>0),
equivalent to (A=0) vs (A>0). However, from inspection of
table of means, while there is little differfence between
(B=1)/(B=2)/(B=3) nevertheless there is a systematic difference
at each level of B between (A=1) and (A=2) -- 0.5, 0.67
and 0.61 respectively. This shows up as an effect of A after
fitting B.

So, in summary, there is a significant effect of A alone (due
to the constant increase per increment in level); there is a
significant effect of B alone (due to the contrast between
(B=0) and (B>0) equivalent to the contrast between (A=0)
and (A>0)); however, once the effect of A has been allowed
for you only have the contrast between levels (B=1)/(B=2)/(B=3)
of B which do not differ enough to be significant. On the other
hand, fitting B first still leaves a constant effect of A
at each of the levels of B which shows up as significant for
A after fitting B. You do not have enough data to detect as
significant the sort of differences between levels of B=1/2/3.

Best wishes,
Ted.

==================================================================

On 19-Apr-04 ivan.borozan at utoronto.ca wrote:
> i have the following table with two factors A, B each respectively
> with 3 and 4 levels (unbalanced design)   
> 
>>S1
>      samples A B
> 1  1.3398553 0 0
> 2  0.8455924 0 0
> 3  1.0290893 0 0
> 4  1.2720512 0 0
> 5  1.2071754 0 0
> 6  1.1859539 0 0
> 7  2.7399659 2 3
> 8  1.2476911 2 3
> 9  2.6389479 2 2
> 10 1.6914068 1 2
> 11 2.2260561 2 1
> 12 1.2955187 1 1
> 13 1.6526140 1 3
> 14 2.3159151 2 3
> 15 2.3905009 1 2
> 16 2.9520105 2 2
> 17 1.9478868 1 1
> 18 1.9936118 1 1
> 19 1.3775338 1 3
> 20 1.9638190 2 2
> 21 1.4697860 1 2
> 22 2.2028858 2 3
> 23 2.4024771 2 1
> 24 1.9935864 1 1
> 
> 
> i fit two different models
> 
> fit1<-aov(samples~A + B,data=S1,contrasts = list(A = contr.treatment, B
> =
> contr.treatment))
> fit2<-aov(samples~A,data=S1,contrasts = list(A = contr.treatment))
> fit3<-aov(samples~B,data=S1,contrasts = list(B = contr.treatment))
> 
> 
> and using 
> 
>>anova(fit1,fit2)
> Analysis of Variance Table
> 
> Model 1: samples ~ A + B
> Model 2: samples ~ A
>   Res.Df      RSS Df Sum of Sq      F Pr(>F)
> 1     19  2.74820                           
> 2     21  3.14667 -2  -0.39847 1.3774 0.2763
> 
> i get B as not significant and
> 
> 
>>anova(fit1,fit3)
> 
> Analysis of Variance Table
> 
> Model 1: samples ~ A + B
> Model 2: samples ~ B
>   Res.Df     RSS Df Sum of Sq      F   Pr(>F)   
> 1     19  2.7482                                
> 2     20  4.2391 -1   -1.4909 10.308 0.004604 **
> 
> A as significant.
> 
> 
> 
> however if i do
> 
>>anova(fit3)
> 
> Analysis of Variance Table
> 
> Response: samples
>           Df Sum Sq Mean Sq F value   Pr(>F)   
> B          3 3.7241  1.2414  5.8567 0.004854 **
> Residuals 20 4.2391  0.2120                    
> 
> 
> i get B as significant and
> 
>>anova(fit2)
> 
> Analysis of Variance Table
> 
> Response: samples
>           Df Sum Sq Mean Sq F value    Pr(>F)    
> A          2 4.8165  2.4083  16.072 5.835e-05 ***
> Residuals 21 3.1467  0.1498 
> 
> A as significant.
> 
> 
> 
> 
> Should i conclude that A is significant and B is not or rather that
> both factors
> are significant ?
> 
> 
> all the best
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 167 1972
Date: 19-Apr-04                                       Time: 20:28:31
------------------------------ XFMail ------------------------------