# [R] R2 always increases as variables are added?

Tony Plate tplate at acm.org
Thu May 24 21:37:11 CEST 2007

```The answer to your question three is that the calculation of r-squared
in summary.lm does depend on whether or not an intercept is included in
the model.  (Another part of the reason for you puzzlement is, I think,
that you are computing R-squared as SSR/SST, which is only valid when
when the model has an intercept).

The code is in summary.lm, here are the relevant excerpts (assuming your
model does not have weights):

r <- z\$residuals
f <- z\$fitted
w <- z\$weights
if (is.null(w)) {
mss <- if (attr(z\$terms, "intercept"))
sum((f - mean(f))^2)
else sum(f^2)
}
...

If you want to compare models with and without an intercept based on
R^2, then I suspect it's most appropriate to use the version of R^2 that
does not use a mean.

It's also worthwhile thinking about what you are actually doing.  I find
the most intuitive definition of R^2
(http://en.wikipedia.org/wiki/R_squared) is

R2 = 1 - SSE / SST

where SSE = sum_i (yhat_i - y_i)^2, (sum of errors in predictions for
you model)
and SST = sum_i (y_i - mean(y))^2 (sum of errors in predictions for an
intercept-only model)

This means that the standard definition of R2 effectively compares the
model with an intercept-only model.  As the error in predictions goes
down, R2 goes up, and the model that uses the mean(y) as a prediction
(i.e., the intercept-only model) provides a scale for these errors.

If you think or know that the true mean of y is zero then it may be
appropriate to compare against a zero model rather than an
intercept-only model (in SST).  And if the sample mean of y is quite
different from zero, and you compare a no-intercept model against an
intercept-only model, then you're going to get results that are not
easily interpreted.

Note that a common way of expressing and computing R^2 is as SSR/SST
(which you used).  (Where SSR = sum_i (yhat_i - mean(y))^2 ). However,
this is only valid when the model has an intercept (i.e., SSR/SST = 1 -
SSE/SST ONLY when the model has an intercept.)

Here's some examples, based on your example:

> set.seed(1)
> data <- data.frame(x1=rnorm(10), x2=rnorm(10), y=rnorm(10), I=1)
>
> lm1 <- lm(y~1, data=data)
> summary(lm1)\$r.squared
[1] 0
> y.hat <- fitted(lm1)
> sum((y.hat-mean(data\$y))^2)/sum((data\$y-mean(data\$y))^2)
[1] 5.717795e-33
>
> # model with no intercept
> lm2 <- lm(y~x1+x2-1, data=data)
> summary(lm2)\$r.squared
[1] 0.6332317
> y.hat <- fitted(lm2)
> # no-intercept version of R^2 (2 ways to compute)
> 1-sum((y.hat-data\$y)^2)/sum((data\$y)^2)
[1] 0.6332317
> sum((y.hat)^2)/sum((data\$y)^2)
[1] 0.6332317
> # standard (assuming model has intercept) computations for R^2:
> SSE <- sum((y.hat - data\$y)^2)
> SST <- sum((data\$y - mean(data\$y))^2)
> SSR <- sum((y.hat - mean(data\$y))^2)
> 1 - SSE/SST
[1] 0.6252577
> # Note that SSR/SST != 1 - SSE/SST (because the model doesn't have an
intercept)
> SSR/SST
[1] 0.6616612
>
> # model with intercept included in data
> lm3 <- lm(y~x1+x2+I-1, data=data)
> summary(lm3)\$r.squared
[1] 0.6503186
> y.hat <- fitted(lm3)
> # no-intercept version of R^2 (2 ways to compute)
> 1-sum((y.hat-data\$y)^2)/sum((data\$y)^2)
[1] 0.6503186
> sum((y.hat)^2)/sum((data\$y)^2)
[1] 0.6503186
> # standard (assuming model has intercept) computations for R^2:
> SSE <- sum((y.hat - data\$y)^2)
> SST <- sum((data\$y - mean(data\$y))^2)
> SSR <- sum((y.hat - mean(data\$y))^2)
> 1 - SSE/SST
[1] 0.6427161
> SSR/SST
[1] 0.6427161
>
>

hope this helps,

Tony Plate

Disclaimer: I too do not have any degrees in statistics, but I'm 95%
sure the above is mostly correct :-)  If there are any major mistakes,
I'm sure someone will point them out.

??? wrote:
> Hi, everybody,
>
> ---------(1)----------- Does R2 always increase as variables are added?
> ---------(2)----------- Does R2 always greater than 1?
> ---------(3)----------- How is R2 in summary(lm(y~x-1))\$r.squared
> calculated? It is different from (r.square=sum((y.hat-mean
> (y))^2)/sum((y-mean(y))^2))
>
> I will illustrate these problems by the following codes:
> ---------(1)-----------  R2  doesn't always increase as variables are added
>
>> x=matrix(rnorm(20),ncol=2)
>> y=rnorm(10)
>>
>> lm=lm(y~1)
>> y.hat=rep(1*lm\$coefficients,length(y))
>> (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> [1] 2.646815e-33
>> lm=lm(y~x-1)
>> y.hat=x%*%lm\$coefficients
>> (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> [1] 0.4443356
>> ################ This is the biggest model, but its R2 is not the biggest,
> why?
>> lm=lm(y~x)
>> y.hat=cbind(rep(1,length(y)),x)%*%lm\$coefficients
>> (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> [1] 0.2704789
>
>
> ---------(2)-----------  R2  can greater than 1
>
>> x=rnorm(10)
>> y=runif(10)
>> lm=lm(y~x-1)
>> y.hat=x*lm\$coefficients
>> (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> [1] 3.513865
>
>
>  ---------(3)----------- How is R2 in summary(lm(y~x-1))\$r.squared
> calculated? It is different from (r.square=sum((y.hat-mean
> (y))^2)/sum((y-mean(y))^2))
>> x=matrix(rnorm(20),ncol=2)
>> xx=cbind(rep(1,10),x)
>> y=x%*%c(1,2)+rnorm(10)
>> ### r2 calculated by lm(y~x)
>> lm=lm(y~x)
>> summary(lm)\$r.squared
> [1] 0.9231062
>> ### r2 calculated by lm(y~xx-1)
>> lm=lm(y~xx-1)
>> summary(lm)\$r.squared
> [1] 0.9365253
>> ### r2 calculated by me
>> y.hat=xx%*%lm\$coefficients
>> (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> [1] 0.9231062
>
>
> Thanks a lot for any cue:)
>
>
>
>

```