[R] R2 always increases as variables are added?

Paul Lynch plynchnlm at gmail.com
Mon May 21 18:39:14 CEST 2007


Junjie,
    First, a disclaimer:  I am not a statistician, and have only taken
one statistics class, but I just took it this Spring, so the concepts
of linear regression are relatively fresh in my head and hopefully I
will not be too inaccurate.
    According to my statistics textbook, when selecting variables for
a model, the intercept term is always present.  The "variables" under
consideration do not include the constant "1" that multiplies the
intercept term.  I don't think it makes sense to compare models with
and without an intercept term.  (Also, I don't know what the point of
using a model without an intercept term would be, but that is probably
just my ignorance.)
    Similarly, the formula you were using for R**2 seems to only be
useful in the context of a standard linear regression (i.e., one that
includes an intercept term).  As your example shows, it is easy to
construct a "fit" (e.g. y = 10,000,000*x) so that SSR > SST if one is
not deriving the fit from the regular linear regression process.
          --Paul

On 5/19/07, 李俊杰 <klijunjie at gmail.com> wrote:
> I know that "-1" indicates to remove the intercept term. But my question is
> why intercept term CAN NOT be treated as a variable term as we place a
> column consited of 1 in the predictor matrix.
>
> If I stick to make a comparison between a model with intercept and one
> without intercept on adjusted r2 term, now I think the strategy is always to
> use another definition of r-square or adjusted r-square, in which
> r-square=sum(( y.hat)^2)/sum((y)^2).
>
> Am I  in the right way?
>
> Thanks
>
> Li Junjie
>
>
> 2007/5/19, Paul Lynch <plynchnlm at gmail.com>:
> > In case you weren't aware, the meaning of the "-1" in y ~ x - 1 is to
> > remove the intercept term that would otherwise be implied.
> >     --Paul
> >
> > On 5/17/07, 李俊杰 <klijunjie at gmail.com> wrote:
> > > Hi, everybody,
> > >
> > > 3 questions about R-square:
> > > ---------(1)----------- Does R2 always increase as variables are added?
> > > ---------(2)----------- Does R2 always greater than 1?
> > > ---------(3)----------- How is R2 in summary(lm(y~x-1))$r.squared
> > > calculated? It is different from (r.square=sum((y.hat-mean
> > > (y))^2)/sum((y-mean(y))^2))
> > >
> > > I will illustrate these problems by the following codes:
> > > ---------(1)-----------  R2  doesn't always increase as
> variables are added
> > >
> > > > x=matrix(rnorm(20),ncol=2)
> > > > y=rnorm(10)
> > > >
> > > > lm=lm(y~1)
> > > > y.hat=rep(1*lm$coefficients,length(y))
> > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> > > [1] 2.646815e-33
> > > >
> > > > lm=lm(y~x-1)
> > > > y.hat=x%*%lm$coefficients
> > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> > > [1] 0.4443356
> > > >
> > > > ################ This is the biggest model, but its R2 is not the
> biggest,
> > > why?
> > > > lm=lm(y~x)
> > > > y.hat=cbind(rep(1,length(y)),x)%*%lm$coefficients
> > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> > > [1] 0.2704789
> > >
> > >
> > > ---------(2)-----------  R2  can greater than 1
> > >
> > > > x=rnorm(10)
> > > > y=runif(10)
> > > > lm=lm(y~x-1)
> > > > y.hat=x*lm$coefficients
> > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> > > [1] 3.513865
> > >
> > >
> > >  ---------(3)----------- How is R2 in summary(lm(y~x-1))$r.squared
> > > calculated? It is different from (r.square=sum((y.hat-mean
> > > (y))^2)/sum((y-mean(y))^2))
> > > > x=matrix(rnorm(20),ncol=2)
> > > > xx=cbind(rep(1,10),x)
> > > > y=x%*%c(1,2)+rnorm(10)
> > > > ### r2 calculated by lm(y~x)
> > > > lm=lm(y~x)
> > > > summary(lm)$r.squared
> > > [1] 0.9231062
> > > > ### r2 calculated by lm(y~xx-1)
> > > > lm=lm(y~xx-1)
> > > > summary(lm)$r.squared
> > > [1] 0.9365253
> > > > ### r2 calculated by me
> > > > y.hat=xx%*%lm$coefficients
> > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
> > > [1] 0.9231062
> > >
> > >
> > > Thanks a lot for any cue:)
> > >
> > >
> > >
> > >
> > > --
> > > Junjie Li,                  klijunjie at gmail.com
> > > Undergranduate in DEP of Tsinghua University,
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> > --
> > Paul Lynch
> > Aquilent, Inc.
> > National Library of Medicine (Contractor)
> >
>
>
>
> --
>
> Junjie Li,                  klijunjie at gmail.com
> Undergranduate in DEP of Tsinghua University,


-- 
Paul Lynch
Aquilent, Inc.
National Library of Medicine (Contractor)



More information about the R-help mailing list