[R] Query on R-squared correlation coefficient for linear regression through origin

peter dalgaard pd@|gd @end|ng |rom gm@||@com
Thu Sep 27 16:22:49 CEST 2018


This is an old discussion. The thing that R is doing is to compare the model to the model without any regressors, which in the no-intercept case is the constant zero. Otherwise, you would be comparing non-nested models and the R^2 would not satisfy the property of being between 0 and 1. 

A similar issue affects anova tables, where the regression sum of squares is sum(yhat^2) rather than sum((yhat - ybar)^2).

-pd

> On 27 Sep 2018, at 12:56 , Patrick Barrie <pjb10 using cam.ac.uk> wrote:
> 
> I have a query on the R-squared correlation coefficient for linear 
> regression through the origin.
> 
> The general expression for R-squared in regression (whether linear or 
> non-linear) is
> R-squared = 1 - sum(y-ypredicted)^2 / sum(y-ybar)^2
> 
> However, the lm function within R does not seem to use this expression 
> when the intercept is constrained to be zero. It gives results different 
> to Excel and other data analysis packages.
> 
> As an example (using built-in cars dataframe):
>> cars.lm=lm(dist ~ 0+speed, data=cars)     # linear regression through 
> origin
>> summary(cars.lm)$r.squared # report R-squared [1] 0.8962893 > 
> 1-deviance(cars.lm)/sum((cars$dist-mean(cars$dist))^2)     # calculates 
> R-squared directly [1] 0.6018997 > # The latter corresponds to the value 
> reported by Excel (and other data analysis packages) > > # Note that we 
> expect R-squared to be smaller for linear regression through the origin
>> # than for linear regression without a constraint (which is 0.6511 in 
> this example)
> 
> Does anyone know what R is doing in this case? Is there an option to get 
> R to return what I termed the "general" expression for R-squared? The 
> adjusted R-squared value is also affected. [Other parameters all seem 
> correct.]
> 
> Thanks for any help on this issue,
> 
> Patrick
> 
> P.S. I believe old versions of Excel (before 2003) also had this issue.
> 
> -- 
> Dr Patrick J. Barrie
> Department of Chemical Engineering and Biotechnology
> University of Cambridge
> Philippa Fawcett Drive, Cambridge CB3 0AS
> 01223 331864
> pjb10 using cam.ac.uk
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com




More information about the R-help mailing list