[R] Generic functions with models? 'predict', 'residuals', 'fitted', ...?

John McHenry john_d_mchenry at yahoo.com
Wed Feb 4 03:10:17 CET 2009


Hi WizaRds!

I don't know if my understanding of models is screwed up or
if there's a fundamental limitation to the way they are 
used in R.

Here's what I'd like to do: create a model and then test data
against that model using generics like 'predict' and 'residuals'.
The problem is that generics don't work for new data sets, except
of course 'predict': 

	set.seed(1)
	df<- data.frame(matrix(rnorm(10*4), nc=4))
	cn<-  c("y", "x1", "x2", "x3")
	colnames(df)<- cn
	model<- lm(y ~ x1 + x2 + x3 + 1, data=df)
	u<- residuals(model, data=df)

	test<- data.frame(matrix(rnorm(10*4), nc=4))
	colnames(test)<- cn
	y.hat<- predict(model, newdata=test)
	u.new<- residuals(model, newdata=test)

# I would expect to get the residuals to the 'test' 
# data fit to 'model' with 'residuals(model, newdata=test)',
# alas no:
	cbind(u, u.new)
		      u        u.new
	1   0.032996295  0.032996295
	2   0.575261650  0.575261650
	3  -0.702501425 -0.702501425
	4   0.482075538  0.482075538
	5   0.300600839  0.300600839
	6  -0.980832577 -0.980832577
	7   0.265239680  0.265239680
	8  -0.341625071 -0.341625071
	9   0.363224527  0.363224527
	10  0.005560545  0.005560545

# I would expect to get the 'test' fit to 'model'
# with 'fitted(model, newdata=test)' but again no:
	cbind(fitted(model), fitted(model, newdata=test))
		  [,1]        [,2]
	1  -0.65945011 -0.65945011
	2  -0.39161833 -0.39161833
	3  -0.13312719 -0.13312719
	4   1.11320526  1.11320526
	5   0.02890693  0.02890693
	6   0.16036419  0.16036419
	7   0.22218937  0.22218937
	8   1.07994978  1.07994978
	9   0.21255683  0.21255683
	10 -0.31094893 -0.31094893

I have to do:

	y.hat<- predict(model, newdata=test)
	# get test residuals:
	u.new<- test$y - y.hat
	u.new
         1          2          3          4          5          6          7 
 1.3661751 -0.4077999  1.1740108  0.4492772 -1.5865168 -0.7633108 -0.8800892 
         8          9         10 
 1.7474658 -0.1016104  2.1081361 
	 

to get the residuals. Maybe this is what one is supposed to do?

To be fair, the help pages for e.g. 'residuals.lm'
don't say that you can do things like 'newdata=test' in:

'residuals(model, newdata=test)' 

or 'data=test' in:

'residuals(model, data=test)' 

but that is exactly what I would like to do!

Am I missing something here, or is this just the way it is?

Thanks guys!

Jack.




More information about the R-help mailing list