# [R] Interesting behavior of lm() with small, problematic data sets

mark.hogue at srs.gov mark.hogue at srs.gov
Tue Sep 5 18:31:14 CEST 2017

```Tim,

I think what you're seeing is
https://en.wikipedia.org/wiki/Loss_of_significance.

Cheers,

Mark

From:   "Glover, Tim" <Tim.Glover at amecfw.com>
To:     "r-help at r-project.org" <r-help at r-project.org>
Date:   09/05/2017 11:37 AM
Subject:        [R] Interesting behavior of lm() with small, problematic
data sets
Sent by:        "R-help" <r-help-bounces at r-project.org>

I've recently come across the following results reported from the lm()
function when applied to a particular type of admittedly difficult data.
When working with
small data sets (for instance 3 points) with the same response for
different predicting variable, the resulting slope estimate is a
reasonable approximation of the expected 0.0, but the p-value of that
slope estimate is a surprising value.  A reproducible example is included
below, along with the output of the summary of results

######### example code
x <- c(1,2,3)
y <- c(1,1,1)

#above results in{ (1,1) (2,1) (3,1)} data set to regress

new.rez <- lm (y ~ x) # regress constant y on changing x)
summary(new.rez) # display results of regression

######## end of example code

Results:

Call:
lm(formula = y ~ x)

Residuals:
1          2          3
5.906e-17 -1.181e-16  5.906e-17

Coefficients:
Estimate Std. Error    t value Pr(>|t|)
(Intercept)  1.000e+00  2.210e-16  4.525e+15   <2e-16 ***
x           -1.772e-16  1.023e-16 -1.732e+00    0.333
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 1.447e-16 on 1 degrees of freedom
Multiple R-squared:  0.7794,    Adjusted R-squared:  0.5589
F-statistic: 3.534 on 1 and 1 DF,  p-value: 0.3112

Warning message:
In summary.lm(new.rez) : essentially perfect fit: summary may be
unreliable

##############

There is a warning that the summary may be unreliable sue to the
essentially perfect fit, but a p-value of 0.3112 doesn?t seem reasonable.
As a side note, the various r^2 values seem odd too.

Tim Glover
Senior Scientist II (Geochemistry, Statistics), Americas - Environment &
Infrastructure, Amec Foster Wheeler
271 Mill Road, Chelmsford, Massachusetts, USA 01824-4105
T +01 978 692 9090      D +01 978 392 5383      M +01 850 445 5039
tim.glover at amecfw.com      amecfw.com

This message is the property of Amec Foster Wheeler plc and/or its
subsidiaries and/or affiliates and is intended only for the named
recipient(s). Its contents (including any attachments) may be
confidential, legally privileged or otherwise protected from disclosure by
law. Unauthorised use, copying, distribution or disclosure of any of it
may be unlawful and is strictly prohibited. We assume no responsibility to
persons other than the intended named recipient(s) and do not accept
liability for any errors or omissions which are a result of email
transmission. If you have received this message in error, please notify us
immediately by reply email to the sender and confirm that the original
message and any attachments and copies have been destroyed and deleted
from your system. If you do not wish to receive future unsolicited
commercial electronic messages from us, please forward this email to:
unsubscribe at amecfw.com and include ?Unsubscribe? in the subject line. If
applicable, you will continue to receive invoices, project communications
and similar factual, non-commercial electronic communications.

Please click http://amecfw.com/email-disclaimer for notices and company
information in relation to emails originating in the UK, Italy or France.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help