[R] R-squared value for linear regression passing through origin using lm()

Fri Oct 19 09:51:37 CEST 2007

Berwin A Turlach, Donnerstag, 18. Oktober 2007:
> G'day all,
> 
> I must admit that I have not read the previous e-mails in this thread,
> but why should that stop me to comment? ;-)

Your comments are very welcome.

> On Thu, 18 Oct 2007 16:17:38 +0200
> Ralf Goertz <R_Goertz at web.de> wrote:
> 
> > But in that case the numerator is very large, too, isn't it? 
> 
> Not necessarily.
> 
> > I don't want to argue, though.
> 
> Good, you might lose the argument. :)

Yes, I admit I lost. :-(

> > But so far, I have not managed to create a dataset where R^2 is
> > larger for the model with forced zero intercept (although I have not
> > tried very hard). It would be very convincing to see one (Etienne?)
> 
> Indeed, you haven't tried hard.  It is not difficult.  Here are my
> canonical commands to convince people why regression through the
> intercept is evil; the pictures should illustrate what is going on:

> [example snipped] 

Thanks to Thomas Lumley there is another convincing example. But still
I've got a problem with it:

> x<-c(2,3,4);y<-c(2,3,3)

> 1-2*var(residuals(lm(y~x+1)))/sum((y-mean(y))^2)

[1] 0.75

That's okay, but neither

> 1-3*var(residuals(lm(y~x+0)))/sum((y-0)^2)
[1] 0.97076

nor

> 1-2*var(residuals(lm(y~x+0)))/sum((y-0)^2)
[1] 0.9805066

give the result of summary(lm(y~x+0)), which is 0.9796. 

> > IIRC, I have not been told so. Perhaps my teachers were not as good
> > they should have been. So what is R^2 good if not to indicate the
> > goodness of fit?.
> 
> I am wondering about that too sometimes. :)   I was always wondering
> that R^2 was described to me by my lecturers as the square of the
> correlation between the x and the y variate.  But on the other hand,
> they pretended that x was fixed and selected by the experimenter (or
> should be regarded as such). If x is fixed and y is random, then it
> does not make sense to me to speak about a correlation between x and y
> (at least not on the population level). 

I see the point. But I was raised with that description, too, and it's
hard to drop that idea. 

> My best guess at the moment is that R^2 was adopted by users of
> statistics before it was properly understood; and by the time it was
> properly understood, it was too much entrenched to abandon it.  Try not
> to teach it these days and see what your "client faculties" will tell
> you.

In order to save the role of R^2 as a goodness-of-fit indicator in zero
intercept models one could use the same formula like in models with a
constant. I mean, if R^2 is the proportion of variance explained by the
model we should use the a priori variance of y[i].

> 1-var(residuals(lm(y~x+0)))/var(y)
[1] 0.3567182

But I assume that this has probably been discussed at length somewhere
more appropriate than r-help.

Thanks,

Ralf