[R] predict: remove columns with new levels automatically

Andreas Wittmann andreas_wittmann at gmx.de
Wed Nov 25 07:48:09 CET 2009


Sorry for my bad description, i don't want get a constructed algorithm without own work. i only hoped to get some advice how to do this. i don't want to predict any sort of data, i reference only to newdata which variables are the same as in the model data. But if factors in the data than i can by possibly that the newdata has a level which doesn't exist in the original data.
So i have to compare each factor in the data and in the newdata and if the newdata has a levels which is not in the original data and drop this variable and do compute the model and prediction again. 
I thought this problem is quite common and i can use an algorithm somebody has already implemented.

best regards

Andreas




-------- Original-Nachricht --------
> Datum: Wed, 25 Nov 2009 00:48:59 -0500
> Von: David Winsemius <dwinsemius at comcast.net>
> An: Andreas Wittmann <andreas_wittmann at gmx.de>
> CC: r-help at r-project.org
> Betreff: Re: [R] predict: remove columns with new levels automatically

> 
> On Nov 24, 2009, at 2:24 PM, Andreas Wittmann wrote:
> 
> > Dear R-users,
> >
> > in the follwing thread
> >
> > http://tolstoy.newcastle.edu.au/R/help/03b/3322.html
> >
> > the problem how to remove rows for predict that contain levels which  
> > are not in the model.
> >
> > now i try to do this the other way round and want to remove columns  
> > (variables) in the model which will be later problematic with new  
> > levels for prediction.
> >
> > ## example:
> > set.seed(0)
> > x <- rnorm(9)
> > y <- x + rnorm(9)
> >
> > training <- data.frame(x=x, y=y, z=c(rep("A", 3), rep("B", 3),  
> > rep("C", 3)))
> > test <- data.frame(x=t<-rnorm(1), y=t+rnorm(1), z="D")
> >
> > lm1 <- lm(x ~ ., data=training)
> > ## prediction does not work because the variable z has the new level  
> > "D"
> > predict(lm1, test)
> >
> > ## solution: the variable z is removed from the model
> > ## the prediction happens without using the information of variable z
> > lm2 <- lm(x ~ y, data=training)
> > predict(lm2, test)
> >
> > How can i autmatically recognice this and calculate according to this?
> 
> Let me get this straight. You want us to predict in advance (or more  
> accurately design an algorithm that can see into the future and work  
> around) any sort of newdata you might later construct????
> 
> --
> 
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT

-- 
Preisknaller: GMX DSL Flatrate für nur 16,99 Euro/mtl.!
http://portal.gmx.net/de/go/dsl02




More information about the R-help mailing list