[R] partial residuals & the output of residuals.lm(..., type="partial")

Thu Feb 19 03:57:18 CET 2009

Dear list,

After thinking about it a little more I solved my question of why I was 
calculating different residuals when using 
residuals.lm(...,type="partial") and when dropping a single term and 
recalculating the residuals.  This is because the two variables are in a 
sense competing with one another in the full model if they are not 
completely orthogonal to one another.  For example, with my hypothetical 
example from before if you make x1 and x2 and more correlated then the 
discrepancy between the two sets of residuals increases, but the problem 
can be solved if you make sure to use the same coefficients from the 
full model when computing the raw residuals without the other variable. 

Dan

Here is the example that shows that more correlated x-variables make the 
problem even worse and a solution to my original question.

set.seed(12)
x1<-runif(100)
x2<-x1+runif(100) ##this will make x1 and x2 more strongly correlated 
than in the first example (see original message)
y<-.13+.25*x1+.70*x2+runif(100)
mod<-lm(y~x1+x2)
plot(residuals(mod,type="partial")[,2],residuals(update(mod,.~.-x2),type='response'))
abline(0,1)
##note how the degree of scatter increases around the 1:1 line
##here is the solution to the problem
##calculate the residuals by hand and make sure to use the estimated 
coefficients from the full model
calc.resids<-y-cbind(rep(1,100),x1)%*%coef(mod)[-3] ##as before I will 
drop the influence of x2 from the model prediction
##center the calculated residuals
calc.resids<- calc.resids-mean(calc.resids)

plot(residuals(mod,type="partial")[,2],calc.resids)
abline(0,1)##now all the points fall right on the line

-------- Original Message --------
Subject: partial residuals & the output of residuals.lm(...,type="partial")
From: Daniel McGlinn <daniel.mcglinn at okstate.edu>
To: r-help at r-project.org <r-help at r-project.org>
Date: 2/18/2009 7:52 PM
> Dear list,
>
> I would like to know how the function residuals.lm calculates the 
> partial residuals from an lm object with more than one predictor 
> variable.  In other words what is residuals.lm(...,type="partial") doing 
> behind the scenes?  According to the help file for residuals.lm 
> (?residuals.lm), "The partial residuals are a matrix with each column 
> formed by omitting a term from the model".  Unfortunately, I cannot seem 
> to recreate the results of the function "residuals.lm" by simply 
> dropping a variable from a model and then calculating the raw residuals 
> of the updated model.  Can anyone see what I am overlooking?  It may be 
> helpful to others if I mention that the usage of 
> residuals.lm(...,type='partial') by the function termplot is what 
> motivated me to look at this function more closely.  Below is a simple 
> example to illustrate my question:
>
> set.seed(12)
>
> x1 <- runif(100)
> x2 <- runif(100)
>
> y <- .13+.25*x1+.70*x2+runif(100)
>
> mod <- lm(y~x1+x2)
>
> ##let's only consider the partial residuals when x2 is dropped from the 
> model
> plot(residuals(mod,type="partial")[,2],residuals(update(mod,.~.-x2),type='response'))
> abline(0,1) ##1:1 line
> ##why do the points not all fall on the 1:1 line?
>
> Thanks,
> Dan
>
>   

-- 
Daniel J. McGlinn
Department of Botany, Oklahoma State University
117 LSE Stillwater OK 74078 USA 405-612-1780
http://ecology.okstate.edu/Libra/