[R] Using xlevels

William Dunlap wdunlap at tibco.com
Wed Mar 30 20:16:23 CEST 2011


Terry,

  The fact that model.frame attaches xlevels to
the terms based on factors in the input data.frame
(and attaches dataClass based on the input data.frame),
but the subsequent call to model.matrix is responsible
for turning character vectors in the data.frame into
factors (and then into contrasts) is part of the reason
that you cannot use predict() on an lmObject created
using a data.frame with character vectors in it.

 > d <- data.frame(y=1:10, x=rep(LETTERS[1:3],c(3,3,4)),
stringsAsFactors=FALSE)
 > fit <- lm(data=d, y~x)
 Warning message:
 In model.matrix.default(mt, mf, contrasts) :
   variable 'x' converted to a factor
 > predict(fit, newdata=data.frame(x=c("A","C"))) # expect c(2.0, 8.5)
 Error: variable 'x' was fitted with type "other" but type "factor" was
supplied

This is one way that changing the default stringsAsFactors=TRUE
can cause problems.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Terry Therneau
> Sent: Wednesday, March 30, 2011 10:29 AM
> To: Prof Brian Ripley
> Cc: r-help at r-project.org
> Subject: Re: [R] Using xlevels
> 
> I see the logic now.  I think that more sentences in the 
> document would
> be very helpful, however.  What is written is very subtle.
> I suggest the following small expansion for model.matrix.Rd:
> 
>   \item{data}{a data frame.  If the object has a \code{terms} 
> attribute
> then it is assumed to be the result of a call to \code{model.frame},
> otherwise \code{model.frame} will be called first.}
> 
>  I often forget that model.frames are not a class, but an "implied"
> class based on the presence of a terms component.  Many users, I
> suspect, do not even have this starting knowledge.
> 
>   Off to make changes to model.frame.coxph and model.matrix.coxph...
> 
> Thanks for the feeback.
> 
> 	Terry
> 
> 
> On Wed, 2011-03-30 at 16:36 +0100, Prof Brian Ripley wrote:
> > On Wed, 30 Mar 2011, Terry Therneau wrote:
> > 
> > > I'm working on predict.survreg and am confused about xlevels.
> > > The model.frame method has the argument, but none of the standard
> > > methods (model.frame.lm, model.frame.glm) appear to make 
> use of it.
> > 
> > But I see this in predict.lm:
> > 
> >          m <- model.frame(Terms, newdata, na.action = na.action,
> >                           xlev = object$xlevels)
> > 
> > It is used to remap levels in newdata to those used in the fit.
> > 
> > >
> > > The documentation for model.matrix states:
> > >  xlev: to be used as argument of model.frame if data has 
> no "terms"
> > > attribute.
> > 
> > Well, the code says
> > 
> >      if (is.null(attr(data, "terms")))
> >          data <- model.frame(object, data, xlev=xlev)
> > 
> > > But the terms attribute has no xlevels information in it, 
> so I find this
> > > statement completely confusing.  Any insight is appreciated.
> > 
> > It means exactly what it says: a 'data' argument with a terms 
> > attribute is considered to be a model frame.
> > 
> > >
> > > Terry Therneau
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list