[R] can predict ignore rows with insufficient info

Peter Whiting pete at sprint.net
Tue Sep 16 18:44:02 CEST 2003


I need predict to ignore rows that contain levels not in the
model.

Consider a data frame, "const", that has columns for the number of
days required to construct a site and the city and state the site
was constructed in.

g<-lm(days~city,data=const)

Some of the sites in const have not yet been completed, and therefore
they have days==NA. I want to predict how many days these sites
will take to complete (I've simplified the above discussion to
remove many of the other factors involved.)

nconst<-subset(const,is.na(const$days))
x<-predict(g,nconst)
Error in model.frame.default(object, data, xlev = xlev) :
        factor city has new level(s) ALBANY

This is because we haven't yet completed a site in Albany.
If I just had one to worry about I could easily fix it (choose
a nearby market with similar characteristic) but I am dealing
with a several hundred cities. Instead, for the cities not
modeled by g I'd simply like to use the state, even though I
don't expect it to be as good:

g<-lm(days~state,data=const)
x<-predict(g,nconst)

I'm not sure how to identify the cities in nconst that are not
modeled by g (my actual model has many more predictors in the
formula) Is there a way to instruct predict to only predict the
rows for which it has enough information and not complain about
the others?

g<-lm(days~city,data=const)
x<-predict(g,nconst) ## the rows of x with city=ALBANY will be NA
g<-lm(days~state,data=const)
y<-predict(g,nconst)
x[is.na(x)]<-y[is.na(x)]

thanks,
pete




More information about the R-help mailing list