[R] Calculation of r squared from a linear regression

Tue Jun 15 09:24:44 CEST 2010

Dear Sandra,

R^2 is just a ratio between the amount of error explained between two
models.

PRE (proportional reduction in error) = R^2 = (SSE model C - SSE model
A)/SSE model C.
This is sometimes expressed as (SSEc-SSEa)/SSEc = SSR/SSEc |SSR=sum squared
reduced

Given your example with some extensions:
x<- c(1,2,3,4)
y<- c(1.6,4.4,5.5,8.3)
x.demean<-x-mean(x)
y.mean<-mean(y)
y.demean<-y-y.mean

# The model is fit as before with all parameters.
fit1<-lm(y~x) # includes intercept term
summary(fit1) # PRE = 0.9749
fit1.SSE<-sum(resid(fit1)^2) # SSE=0.578

fit2<-lm(y~x-1) # excludes intercept, as in the original example (forces the
intercept to zero)
summary(fit2) # PRE = 0.9946
fit2.SSE<-sum(resid(fit2)^2) # SSE=0.6596667

# In order to understand the comparison taking place in fit1
SSEc <-sum(y.demean^2) #SSE of a model predicting only the mean
SSEa <-fit1.SSE
fit1.PRE <-(SSEc-fit1.SSE)/SSEc   #   = 0.9749 as by summary(lm(fit1))
SSEc.noint <-sum(y^2) # =121.06
fit2.PRE<-(SSEc.noint-fit2.SSE)/SSEc.noint # = 0.994551 or 0.9946 as before

Hope this helps.
Sincerely,
KeithC.

On 2010-06-11 2:16, Sandra Hawthorne wrote:
> Hi,
>
> I'm trying to verify the calculation of coefficient of determination (r
squared) for linear regression. I've done the calculation manually with a
simple test case and using the definition of r squared outlined in
summary(lm) help. There seems to be a discrepancy between the what R
produced and the manual calculation. Does anyone know why this is so? What
does the multiple r squared reported in summary(lm) represent?
>
> # The test case:
> x<- c(1,2,3,4)
> y<- c(1.6,4.4,5.5,8.3)
> dummy<- data.frame(x, y)
> fm1<- lm(y ~ x-1, data = dummy)
> summary(fm1)
> betax<- fm1$coeff[x] * sd(x) / sd(y)
> # cd is coefficient of determination
> cd<- betax * cor(y, x)