[R] predict.lm() question

Duncan Murdoch murdoch at stats.uwo.ca
Tue Apr 8 00:08:37 CEST 2008


On 07/04/2008 5:57 PM, Chip Barnaby wrote:
> Dear R-people ...
> 
> I'm a new user.  I can't get predict.lm() to produce predictions for 
> new independent data.  There are some messages in archived help about 
> this problem, but I still don't see my error after reviewing 
> those.  I understand that the new independent data must have the same 
> name(s) as used when the model was made.
> 
> In the example below, predict.lm produces the predictions for the 
> original (model input) data plus a warning message.  What I want is 
> predictions for alternative data (in data frame DX in the example).
> 
> Thanks,
> Chip Barnaby
> 
>  > D<-data.frame( X=seq(1:10))
>  > D$Y<-D$X+rnorm( 10)
>  > D
>      X          Y
> 1   1  0.3811634
> 2   2  1.8770049
> 3   3  3.5253376
> 4   4  3.1851957
> 5   5  3.8088813
> 6   6  5.7333074
> 7   7  7.4896623
> 8   8  7.9394056
> 9   9  8.6683570
> 10 10 10.7480675
>  > lm<-lm( D$Y~D$X)
>  > summary( lm)
> 
> Call:
> lm(formula = D$Y ~ D$X)
> 
> Residuals:
>       Min       1Q   Median       3Q      Max
> -0.98812 -0.36354 -0.09808  0.48154  0.88288
> 
> Coefficients:
>              Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.58935    0.41680  -1.414    0.195
> D$X          1.07727    0.06717  16.037 2.29e-07 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
> Residual standard error: 0.6101 on 8 degrees of freedom
> Multiple R-Squared: 0.9698,     Adjusted R-squared: 0.9661
> F-statistic: 257.2 on 1 and 8 DF,  p-value: 2.293e-07
> 
>  > DX<-data.frame( X=seq( 5.5, 11.5))
>  > DX
>       X
> 1  5.5
> 2  6.5
> 3  7.5
> 4  8.5
> 5  9.5
> 6 10.5
> 7 11.5
>  > predict.lm( lm, DX)
>           1          2          3          4          5          6          7
>   0.4879174  1.5651887  2.6424600  3.7197313  4.7970026  5.8742739  6.9515453
>           8          9         10
>   8.0288166  9.1060879 10.1833592
> Warning message:
> 'newdata' had 7 rows but variable(s) found have 10 rows

Your formula refers to D explicitly, so predict.lm will never look at 
DX.  You need to do the fit as

fit <- lm( Y~X, data=D)

Duncan Murdoch



More information about the R-help mailing list