[R] Surprising results from summary(lm()) on data with NO variation

Fri Dec 13 21:50:03 CET 2002

It's just rounding errors.  We can't tell if you are genuinely interested
in studying rounding error or not.  In other words, whether something is
really constant or has a very small range of variation is not something
you can get from the numbers alone.

lda tries hard to warn here, but doing that well is expensive.

On Fri, 13 Dec 2002, Warnes, Gregory R wrote:

>
> I have some data (from Affymetrix experiments) where I fit an aov() model to
> a large number of outcome variables.  A reasonable fraction of the outcome
> variables have 0 variability because values below a cutoff have been
> replaced with the cutoff value (often 20) .
>
> In this case, the overall p-value from summary(lm(..))  is misleadingly
> small:
>
> For example (equivalent results in R 1.6.1 on both Solaris and for Windows
> 2000):
>
>
> 	> x <- rep(20, 4+5+5)  # no variation
> 	> Treatment <- factor(rep( c("Control","Low","High"), c(4,5,5) ))  #
> 3 treatment levels
> 	> summary( lm( x ~ Treatment)  )
>
> 	Call:
> 	lm(formula = x ~ Treatment)
>
> 	Residuals:
> 	       Min         1Q     Median         3Q        Max
> 	-1.994e-14  7.889e-31  7.889e-31  7.889e-31  6.647e-15
>
> 	Coefficients:
> 	               Estimate Std. Error   t value Pr(>|t|)
> 	(Intercept)   2.000e+01  3.471e-15 5.762e+15   <2e-16 ***
> 	TreatmentHigh 6.647e-15  4.657e-15 1.427e+00    0.181
> 	TreatmentLow  6.647e-15  4.657e-15 1.427e+00    0.181
> 	---
> 	Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
>
> 	Residual standard error: 6.942e-15 on 11 degrees of freedom
> 	Multiple R-Squared: 0.5333,     Adjusted R-squared: 0.4485
> 	F-statistic: 6.286 on 2 and 11 DF,  p-value: 0.01512
>                                           ^^^^^^^^^^^^^^^^
>
> Note F statistic and the overall p-value.
>
> For some values (eg 0 and 7),  fitting this model will generate NA's for the
> estimates.  For most values, we get similar results.  I presume that this is
> due to numerical problems in the fitting algorithm.  Still the result is
> quite surprising.
>
> It appears that summary.aov doesn't have the same problem:
>
> 	> x <- rep(20, 4+5+5)  # no variation
> 	> Treatment <- factor(rep( c("Control","Low","High"), c(4,5,5) ))  #
> 3 treatment levels
> 	> summary( aov( x ~ Treatment)  )
> 	            Df     Sum Sq    Mean Sq F value Pr(>F)
> 	Treatment    2 1.2622e-28 6.3110e-29  1.3095 0.3089
> 	Residuals   11 5.3011e-28 4.8190e-29
>
> Perhaps the mechanism for computing the F statistic that summary.aov uses is
> less susceptible to the 'no variability' problem than the computation in
> summary.lm.
>
> I'm posting this here  to raise a warning flag.  For my particular
> application, I'll just check for the no variability case and handle it
> separately.  I'll leave it to the R core folks to decide if this is a big
> enough problem to do something about in the code.
>
> -Greg
>
>
> LEGAL NOTICE\ Unless expressly stated otherwise, this message is ... [[dropped]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> http://www.stat.math.ethz.ch/mailman/listinfo/r-help
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595