[R] Question About lm()

Bromaghin, Jeffrey F jbrom@gh|n @end|ng |rom u@g@@gov
Wed Feb 9 23:00:40 CET 2022


Hello,

I was constructing a simple linear model with one categorical (3-levels) and one quantitative predictor variable for a colleague. I estimated model parameters with and without an intercept, sometimes called reference cell coding and cell means coding.

Model 1: yResp ~ -1 + xCat + xCont
Model 2: yResp ~ xCat + xCont

These models are equivalent and the estimated coefficients come out fine, but the R-squared and F statistics returned by summary() differ markedly. I spent some time looking at the code for both lm() and summary.lm() but did not find the source of the difference. aov() and anova() results also differ, so I suspect the issue involves how the sums of squares are being computed. I've also spent some time trying to search online for information on this, without success. I haven't used lm() for quite a while, but my memory is that these differences didn't occur in the distant past when I was teaching.

Thanks in advance for any insights you might have,
Jeff

Jeffrey F. Bromaghin
Research Statistician
USGS Alaska Science Center
907-786-7086
Jeffrey Bromaghin, Ph.D. | U.S. Geological Survey (usgs.gov)<https://www.usgs.gov/staff-profiles/jeffrey-bromaghin>
Ecosystems Analytics | U.S. Geological Survey (usgs.gov)<https://www.usgs.gov/centers/alaska-science-center/science/ecosystems-analytics>


	[[alternative HTML version deleted]]



More information about the R-help mailing list