[R] summary.manova rank deficiency error + data

Wed Aug 13 18:04:41 CEST 2008

Pedro Mardones wrote:
> Thanks for the reply. The SAS output is attached but seems to me that
> doesn't correspond to the wihtin-row contrasts as you suggested. By
> the way, yes the data are highly correlated, in fact each row
> correspond to the first part of a signal vector. Thanks anyway....
> PM
>   
Agreed. I tried disabling the check that causes R to protest, and then 
it gives similar DF but not quite the same statistics, quite possibly 
due to numerical instabilities in one or both systems. (You can easily 
try yourself, just do anova.mlm <- stats::anova.mlm and edit the qr() 
call inside.)

 > anova(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), test = "Wilks")
Analysis of Variance Table

            Df    Wilks approx F num Df den Df Pr(>F)   
(Intercept)  1 0.002537  1887.24      5     24 <2e-16 ***
GROUP        2     0.62     1.29     10     48 0.2616   
Residuals   28                                          
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


>
>                                       The GLM Procedure
>                               Multivariate Analysis of Variance
>                                     E = Error SSCP Matrix
>                    y1                y2                y3
>   y4                y5
>  y1      0.0353518799       0.035256904      0.0351327804
> 0.0349749601      0.0347868018
>  y2       0.035256904      0.0351627227      0.0350395053
> 0.0348827098      0.0346956744
>  y3      0.0351327804      0.0350395053      0.0349173343
> 0.0347617352      0.0345760232
>  y4      0.0349749601      0.0348827098      0.0347617352
> 0.0346075203      0.0344233531
>  y5      0.0347868018      0.0346956744      0.0345760232
> 0.0344233531      0.0342409225
>
> Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|
>       DF = 28             y1             y2             y3
> y4             y5
>       y1            1.000000       0.999992       0.999967
> 0.999921       0.999852
>                                      <.0001         <.0001
> <.0001         <.0001
>       y2            0.999992       1.000000       0.999991
> 0.999963       0.999911
>                       <.0001                        <.0001
> <.0001         <.0001
>       y3            0.999967       0.999991       1.000000
> 0.999990       0.999958
>                       <.0001         <.0001
> <.0001         <.0001
>       y4            0.999921       0.999963       0.999990
> 1.000000       0.999989
>                       <.0001         <.0001         <.0001
>            <.0001
>       y5            0.999852       0.999911       0.999958
> 0.999989       1.000000
>                       <.0001         <.0001         <.0001         <.0001
>
> The SAS System     10:33 Wednesday, August 13, 2008   8
>                                       The GLM Procedure
>                               Multivariate Analysis of Variance
>                               H = Type III SSCP Matrix for group
>                    y1                y2                y3
>   y4                y5
>  y1      0.0023822408       0.002365848      0.0023471328
> 0.0023261249      0.0023030993
>  y2       0.002365848      0.0023495679      0.0023309816
> 0.0023101183      0.0022872511
>  y3      0.0023471328      0.0023309816      0.0023125426
> 0.0022918453      0.0022691608
>  y4      0.0023261249      0.0023101183      0.0022918453
> 0.0022713359      0.0022488593
>  y5      0.0023030993      0.0022872511      0.0022691608
> 0.0022488593      0.0022266141
>
>                  Characteristic Roots and Vectors of: E Inverse * H, where
>                               H = Type III SSCP Matrix for group
>                                     E = Error SSCP Matrix
> Characteristic           Characteristic Vector  V'EV=1
>           Root  Percent            y1            y2            y3
>       y4            y5
>     0.41840103    71.72     -7542.628     17131.814      5347.394
> -31627.317     16700.100
>     0.16496011    28.28     -4180.854     -4413.446     32096.035
> -35545.204     12040.697
>     0.00000001     0.00    -41004.875    107291.004    -95905.664
> 32641.189     -3028.470
>     0.00000000     0.00      -416.226      -111.206       410.721
>  295.193      -171.953
>     0.00000000     0.00    -14678.651      5787.997     54718.250
> -69055.249     23218.580
>
>    MANOVA Test Criteria and F Approximations for the Hypothesis of No
> Overall group Effect
>                              H = Type III SSCP Matrix for group
>                                     E = Error SSCP Matrix
>                                      S=2    M=1    N=11
>        Statistic                        Value    F Value    Num DF
> Den DF    Pr > F
>        Wilks' Lambda               0.60518744       1.37        10
>    48    0.2227
>        Pillai's Trace              0.43658228       1.40        10
>    50    0.2095
>        Hotelling-Lawley Trace      0.58336114       1.37        10
> 33.362    0.2385
>        Roy's Greatest Root         0.41840103       2.09         5
>    25    0.1000
>
>
>
>
>
>
>
>
>
>
> On Wed, Aug 13, 2008 at 4:34 AM, Peter Dalgaard
> <p.dalgaard at biostat.ku.dk> wrote:
>   
>> Pedro Mardones wrote:
>>     
>>> Dear R-users;
>>>
>>> Previously I posted a question about the problem of rank deficiency in
>>> summary.manova. As somebody suggested, I'm attaching a small part of
>>> the data set.
>>>
>>> #***************************************************
>>>
>>> "test" <-
>>>
>>> structure(.Data = list(structure(.Data = c(rep(1,3),rep(2,18),rep(3,10)),
>>> levels = c("1", "2", "3"),
>>> class = "factor")
>>>
>>>
>>> ,c(0.181829,0.090159,0.115824,0.112804,0.134650,0.249136,0.163144,0.122012,0.157554,0.126283,
>>>
>>> 0.105344,0.125125,0.126232,0.084317,0.092836,0.108546,0.159165,0.121620,0.142326,0.122770,
>>>
>>> 0.117480,0.153762,0.156551,0.185058,0.161651,0.182331,0.139531,0.188101,0.103196,0.116877,0.113733)
>>>
>>>
>>> ,c(0.181445,0.090254,0.115840,0.112863,0.134610,0.249003,0.163116,0.122135,0.157206,0.126129,
>>>
>>> 0.105302,0.124917,0.126243,0.084455,0.092818,0.108458,0.158769,0.121244,0.141981,0.122595,
>>>
>>> 0.117556,0.153507,0.156308,0.184644,0.161421,0.181999,0.139376,0.187708,0.103126,0.116615,0.113746)
>>>
>>>
>>> ,c(0.181058,0.090426,0.115926,0.113022,0.134632,0.248845,0.163140,0.122331,0.156871,0.126023,
>>>
>>> 0.105335,0.124757,0.126325,0.084690,0.092885,0.108455,0.158386,0.120913,0.141676,0.122492,
>>>
>>> 0.117707,0.153293,0.156095,0.184242,0.161214,0.181670,0.139271,0.187318,0.103129,0.116421,0.113826)
>>>
>>>
>>> ,c(0.180692,0.090704,0.116110,0.113319,0.134745,0.248678,0.163256,0.122637,0.156581,0.125998,
>>>
>>> 0.105479,0.124686,0.126514,0.085066,0.093088,0.108587,0.158040,0.120674,0.141446,0.122488,
>>>
>>> 0.117972,0.153150,0.155954,0.183885,0.161063,0.181383,0.139251,0.186956,0.103232,0.116351,0.114001)
>>>
>>>
>>> ,c(0.180353,0.091088,0.116392,0.113753,0.134965,0.248520,0.163475,0.123046,0.156354,0.126067,
>>>
>>> 0.105726,0.124713,0.126821,0.085584,0.093432,0.108858,0.157742,0.120533,0.141309,0.122595,
>>>
>>> 0.118340,0.153088,0.155897,0.183582,0.160975,0.181143,0.139314,0.186636,0.103449,0.116415,0.114275)
>>> )
>>> ,names = c("GROUP", "Y1", "Y2", "Y3", "Y4","Y5")
>>> ,row.names = seq(1:31)
>>> ,class = "data.frame"
>>> )
>>>
>>> summary(manova(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), test = "Wilks")
>>>
>>> #Error in summary.manova(manova(cbind(Y1, Y2, Y3, Y4, Y5) ~ GROUP, test),
>>>  :
>>>  residuals have rank 3 < 5
>>>
>>> #***************************************************
>>>
>>> What I don't understand is why SAS returns no errors using PROC GLM
>>> for the same data set. Is because PROC GLM doesn't take into account
>>> problems of rank deficiency? So, should I trust manova instead of PROC
>>> GLM output? I know it can be a touchy question but I would like to
>>> receive some insights.
>>> Thanks
>>> PM
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>       
>> What you have here is extremely correlated data:
>>
>>     
>>> (V <- estVar(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test)))
>>>       
>> Y1 Y2 Y3 Y4 Y5
>> Y1 0.001262567 0.001259177 0.001254746 0.001249106 0.001242385
>> Y2 0.001259177 0.001255814 0.001251416 0.001245812 0.001239132
>> Y3 0.001254746 0.001251416 0.001247055 0.001241494 0.001234861
>> Y4 0.001249106 0.001245812 0.001241494 0.001235983 0.001229405
>> Y5 0.001242385 0.001239132 0.001234861 0.001229405 0.001222889
>>     
>>> eigen(V)
>>>       
>> $values
>> [1] 6.224077e-03 2.313066e-07 3.499837e-10 4.259125e-12 1.334146e-12
>>
>> $vectors
>> [,1] [,2] [,3] [,4] [,5]
>> [1,] 0.4503756 0.61213579 0.5204920 -0.3485941 0.1732681
>> [2,] 0.4491807 0.32333236 -0.1873653 0.5929444 -0.5540795
>> [3,] 0.4476157 0.01442094 -0.5498688 0.1272921 0.6934503
>> [4,] 0.4456201 -0.31202109 -0.3198606 -0.6557557 -0.4144143
>> [5,] 0.4432397 -0.65052351 0.5378809 0.2840428 0.1017918
>>
>> Notice the more than 9 orders of magnitude between the eigenvalues.
>>
>> I think that what is happening is that what SAS calls MANOVA is actually
>> looking at within-row contrasts, which effectively removes the largest
>> eigenvalue. In R, the equivalent would be
>>
>>     
>>> anova(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), X=~1, test = "Wilks")
>>>       
>> Analysis of Variance Table
>>
>>
>> Contrasts orthogonal to
>> ~1
>>
>> Df Wilks approx F num Df den Df Pr(>F)
>> (Intercept) 1 0.037 164.873 4 25 <2e-16 ***
>> GROUP 2 0.701 1.215 8 50 0.3098
>> Residuals 28
>> ---
>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>
>> or (this could be computationally more precice, but in fact it gives the
>> same result)
>>
>>     
>>> anova(lm(cbind(Y2,Y3,Y4,Y5)-Y1~GROUP, test), test = "Wilks")
>>>       
>> --
>>  O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>>  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>> (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
>>
>>
>>     


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907