[R] lm without intercept

Sun Jul 29 06:52:24 CEST 2012

Hi,

R actually uses a different formula for calculating the R square
depending on whether the intercept is in the model or not.

You may also find this discussion helpful:
http://stats.stackexchange.com/questions/7948/when-is-it-ok-to-remove-the-intercept-in-lm/

If you conceptualize R^2 as the squared correlation between the
oberserved and fitted values, it is easy to get:

summary(m0 <- lm(mpg ~ 0 + disp, data = mtcars))
summary(m1 <- lm(mpg ~ disp, data = mtcars))
cor(mtcars$mpg, fitted(m0))^2
cor(mtcars$mpg, fitted(m1))^2

but that is not how R calculates R^2.

Cheers,

Josh

On Sat, Jul 28, 2012 at 10:40 AM, citynorman <citynorman at hotmail.com> wrote:
> I've just picked up R (been using Matlab, Eviews etc) and I'm having the same
> issue. Running reg=lm(ticker1~ticker2)  gives R^2=50% while running
> reg=lm(ticker1~0+ticker2) gives R^2=99%!! The charts suggest the fit is
> worse not better and indeed Eviews/Excel/Matlab all say R^2=15% with
> intercept=0. How come R calculates a totally different value?!
>
> Call:
> lm(formula = ticker1 ~ ticker2)
>
> Residuals:
>      Min       1Q   Median       3Q      Max
> -0.22441 -0.03380  0.01099  0.04891  0.16688
>
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)  1.57062    0.08187   19.18   <2e-16 ***
> ticker2      0.61722    0.02699   22.87   <2e-16 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.07754 on 530 degrees of freedom
> Multiple R-squared: 0.4967,     Adjusted R-squared: 0.4958
> F-statistic: 523.1 on 1 and 530 DF,  p-value: < 2.2e-16
>
> Call:
> lm(formula = ticker1 ~ 0 + ticker2)
>
> Residuals:
>       Min        1Q    Median        3Q       Max
> -0.270785 -0.069280 -0.007945  0.087340  0.268786
>
> Coefficients:
>         Estimate Std. Error t value Pr(>|t|)
> ticker2 1.134508   0.001441   787.2   <2e-16 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.1008 on 531 degrees of freedom
> Multiple R-squared: 0.9991,     Adjusted R-squared: 0.9991
> F-statistic: 6.197e+05 on 1 and 531 DF,  p-value: < 2.2e-16
>
>
> Jan private wrote
>>
>> Hi,
>>
>> thanks for your help. I'm beginning to understand things better.
>>
>>> If you plotted your data, you would realize that whether you fit the
>>> 'best' least squares model or one with a zero intercept, the fit is
>>> not going to be very good
>>> Do the data cluster tightly around the dashed line?
>> No, and that is why I asked the question. The plotted fit doesn't look
>> any better with or without intercept, so I was surprised that the
>> R-value etc. indicated an excellent regression (which I now understood
>> is the wrong interpretation).
>>
>> One of the references you googled suggests that intercepts should never
>> be omitted. Is this true even if I know that the physical reality behind
>> the numbers suggests an intercept of zero?
>>
>> Thanks,
>>       Jan
>>
>> ______________________________________________
>> R-help@ mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/lm-without-intercept-tp3312429p4638204.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/