[Rd] predict (PR#2685)

Mark.Bravington at csiro.au Mark.Bravington at csiro.au
Wed Mar 26 00:29:39 MET 2003


There is a bug in `predict' whereby the order of variables sometimes gets
re-arranged compared to the original fit, and then disaster results.
Specifically, the 'variables' and 'predvars' attributes of a 'terms' object
get out of synch. This only happens when the terms in the original formula
get re-ordered during fitting:

test> scrunge.data_ data.frame( contin=1:10, discrete=factor( rep( c( 'cat',
'dog'), 5)), resp=runif( 10))
test> lm.ok_ lm( resp ~ discrete + contin %in% discrete, data=scrunge.data)
test> predict( lm.ok, scrunge.data) # no problemo
         1          2          3          4          5          6          7
8          9         10 
0.29663793 0.04572655 0.42661779 0.31668732 0.55659764 0.58764809 0.68657750
0.85860886 0.81655736 1.12956963 

test> lm.bug_ lm( resp ~ contin %in% discrete + discrete, data=scrunge.data)
# terms will be re-ordered
test> predict( lm.bug, scrunge.data)
Error in "contrasts<-"(*tmp*, value = "contr.treatment") : 
        contrasts apply only to factors
In addition: Warning message: 
variable discrete is not a factor in: model.frame.default(object, data, xlev
= xlev) 

This actually turns out to be a bug in `model.frame.default', to do with an
inconsistency between `predvars' and `vars' when `model.frame.default' is
called inside `predict'. AFAICS it can be fixed by including the commented
line below in `model.frame.default':

    <<...>>
    vars <- attr(formula, "variables")
    predvars <- attr(formula, "predvars")
    if (is.null(predvars)) 
        predvars <- vars
    varnames <- as.character(predvars[-1]) # MVB: was vars[-1] not
predvars[-1]
    variables <- eval(predvars, data, env)
    <<...>>
    
This has the side-effect that there are some ugly column names in the
model.frame if e.g. a `poly' term is used, but doesn't actually seem to hurt
the prediction.

However, that doesn't fix it all. There is still a bug in `predict', even
after replacing `model.frame.default' with the above:

test> predict( lm.bug, scrunge.data)
Error in drop(X[, piv, drop = FALSE] %*% beta[piv]) : 
        subscript out of bounds
test> # wot???        

This time, the bug is in `delete.response', which call `terms' to set most
of the attributes including `variables', but adjusts the original `predvars'
by hand. Because `terms' returns the variables in a different order when
it's called by `predict.lm' to when it was originally called by `lm', things
get out of synch.

This is a slightly tricky bug to fix, because `predvars' and `variables' can
look a bit different e.g. if there are `poly' terms, but I think the
following change near the end of `delete.response' does the trick:
  
  <<...>>
  if (length(formula(termobj)) == 3) {
#   Old code, reliant on maintaining the order of terms: attr(tt,
"predvars") <- attr(termobj, "predvars")[-2]
    reorder <- match( sapply( attr( tt, 'variables'), deparse), sapply(
attr( termobj, 'variables'), deparse)) # MVB
    attr( tt, 'predvars') <- attr( termobj, 'predvars')[ reorder] # MVB
  }
  <<...>>
  
cheers
Mark

*******************************

Mark Bravington
CSIRO (CMIS)
PO Box 1538
Castray Esplanade
Hobart
TAS 7001

phone (61) 3 6232 5118
fax (61) 3 6232 5012
Mark.Bravington at csiro.au



More information about the R-devel mailing list