[R] predict: remove columns with new levels automatically

Andreas Wittmann andreas_wittmann at gmx.de
Wed Nov 25 20:20:30 CET 2009


Thank you all for the good advice.

Now i did a fast hack, which does want i was looking for, maybe anyone 
else finds this usefull


set.seed(0)
x <- rnorm(9)
y <- x + rnorm(9)

training <- data.frame(x=x, y=y,
                       z1=c(rep("A", 3), rep("B", 3), rep("C", 3)),
                       z2=c(rep("F", 4), rep("G", 5)))
test <- data.frame(x=t<-rnorm(1), y=t+rnorm(1), z1="D", z2="F")


`predict.drop` <- function(f, dat, newdat)
{
  datlev <- vector("list", ncol(dat))
  newdatlev <- vector("list", ncol(newdat))

  `filllevs` <- function(dat, veclev)
  {
    for (j in 1:ncol(dat))
    {
      if (is.factor(dat[,j]))
        veclev[[j]] <- levels(dat[,j])
      else
        veclev[[j]] <- NULL
    }

    return(veclev)
  }

  datlev <- filllevs(dat, datlev)
  newdatlev <- filllevs(newdat, newdatlev)

  if (ncol(dat) == ncol(newdat))
  {
    drop <- logical(ncol(dat))
    names(drop) <- colnames(dat)

    for (j in 1:ncol(dat))
    {
      if (!is.null(datlev[[j]]))
      {
        if (!(newdatlev[[j]] %in% datlev[[j]]))
          drop[j] <- TRUE
      }
    }
  }
  else
    stop("dat and newdat must have the same column length!")

  m <- lm(formula(f), data=dat[,(1:ncol(dat))[!drop]])
  p <- predict(m, newdat)

  return(list(drop=drop, p=p))
}


predict.drop(x ~ ., training, test)


best regards

Andreas




David Winsemius wrote:
>
> On Nov 25, 2009, at 1:48 AM, Andreas Wittmann wrote:
>
>> Sorry for my bad description, i don't want get a constructed 
>> algorithm without own work. i only hoped to get some advice how to do 
>> this. i don't want to predict any sort of data, i reference only to 
>> newdata which variables are the same as in the model data. But if 
>> factors in the data than i can by possibly that the newdata has a 
>> level which doesn't exist in the original data.
>> So i have to compare each factor in the data and in the newdata and 
>> if the newdata has a levels which is not in the original data and 
>> drop this variable and do compute the model and prediction again.
>> I thought this problem is quite common and i can use an algorithm 
>> somebody has already implemented.
>>
>> best regards
>>
>> Andreas
>>
> If you use str to look at the lm1 object you will find at the bottom a 
> list called "x":
>
> lm1$x will show you the factors that were present in variables at the 
> time of the model creation
> > lm1$x
> $z
> [1] "A" "B" "C"
>
> New testing scenario good level and bad level:
>
> test <- data.frame(x=t<-rnorm(2), y=t+rnorm(2), z=c("B", "D") )
>  lm1 <- lm(x ~ ., data=training)
>  predict(lm1, subset(test, z %in% lm1$x$z) )  # get prediction for 
> good level only
>         1
> 0.4225204
>
>>
>>
>>
>> -------- Original-Nachricht --------
>>> Datum: Wed, 25 Nov 2009 00:48:59 -0500
>>> Von: David Winsemius <dwinsemius at comcast.net>
>>> An: Andreas Wittmann <andreas_wittmann at gmx.de>
>>> CC: r-help at r-project.org
>>> Betreff: Re: [R] predict: remove columns with new levels automatically
>>
>>>
>>> On Nov 24, 2009, at 2:24 PM, Andreas Wittmann wrote:
>>>
>>>> Dear R-users,
>>>>
>>>> in the follwing thread
>>>>
>>>> http://tolstoy.newcastle.edu.au/R/help/03b/3322.html
>>>>
>>>> the problem how to remove rows for predict that contain levels which
>>>> are not in the model.
>>>>
>>>> now i try to do this the other way round and want to remove columns
>>>> (variables) in the model which will be later problematic with new
>>>> levels for prediction.
>>>>
>>>> ## example:
>>>> set.seed(0)
>>>> x <- rnorm(9)
>>>> y <- x + rnorm(9)
>>>>
>>>> training <- data.frame(x=x, y=y, z=c(rep("A", 3), rep("B", 3),
>>>> rep("C", 3)))
>>>> test <- data.frame(x=t<-rnorm(1), y=t+rnorm(1), z="D")
>>>>
>>>> lm1 <- lm(x ~ ., data=training)
>>>> ## prediction does not work because the variable z has the new level
>>>> "D"
>>>> predict(lm1, test)
>>>>
>>>> ## solution: the variable z is removed from the model
>>>> ## the prediction happens without using the information of variable z
>>>> lm2 <- lm(x ~ y, data=training)
>>>> predict(lm2, test)
>>>>
>>>> How can i autmatically recognice this and calculate according to this?
>>>
>>> Let me get this straight. You want us to predict in advance (or more
>>> accurately design an algorithm that can see into the future and work
>>> around) any sort of newdata you might later construct????
>>>
>>> -- 
>>>
>>> David Winsemius, MD
>>> Heritage Laboratories
>>> West Hartford, CT
>>
>> -- 
>> Preisknaller: GMX DSL Flatrate für nur 16,99 Euro/mtl.!
>> http://portal.gmx.net/de/go/dsl02
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>




More information about the R-help mailing list