[R] randomForests predict problem

Liaw, Andy andy_liaw at merck.com
Wed Apr 2 15:43:09 CEST 2003


Yves,

I will add checks for NAs in predict.randomForest().

In the next version of randomForest (currently called 3.9-x), there will be
facilities for handling NAs in the training set.  However, there's no way to
handle NAs in the test set yet.  I believe Leo is still working on that.

In Leo's v.4 of the Fortran code, he uses proximity from random forest to
iteratively impute NAs, starting with column median or mode (depending on
variable types).  I've implemented this scheme at the R level, so that it
works for both regression and classification.

There are a couple of things in Leo's new code that I have not added to the
package, and that's why the version is 3.9 rather than 4.0.  If you would
like to test the new code, please let me know.

Cheers,
Andy

> -----Original Message-----
> From: Yves Brostaux [mailto:brostaux.y at fsagx.ac.be]
> Sent: Wednesday, April 02, 2003 8:34 AM
> To: r-help at stat.math.ethz.ch
> Cc: Liaw, Andy; Torsten Hothorn
> Subject: RE: [R] randomForests predict problem
> 
> 
> I use randomForest version 3.4-4, but yes, now I correctly 
> omitted NA's it 
> works. I should have made a mistake while removing them first time.
> 
> I was surprised that this method doesn't have another way to 
> deal with NA's 
> than omitting them. As Torsten Hothorn suggested, the 
> associated predict 
> function should then check for NA's in newdata, shouldn't it ?
> 
> Thank you both for your answers !
> 
> At 15:12 02/04/03, Liaw, Andy wrote:
> >Yves,
> >
> >Which version of the package are you using?  I get:
> >
> > > soy <- na.omit(Soybean)
> > > ts <- sample(nrow(soy), 150, replace=FALSE)
> > > sb.rf <- randomForest(Class ~ ., data=soy[-ts,])
> > > table(predict(sb.rf, soy[ts,], type="class"))
> >
> >                2-4-d-injury         alternarialeaf-spot
> >                           0                          37
> >                 anthracnose            bacterial-blight
> >                          10                           3
> >           bacterial-pustule                  brown-spot
> >                           2                          29
> >              brown-stem-rot                charcoal-rot
> >                          11                           7
> >               cyst-nematode diaporthe-pod-&-stem-blight
> >                           0                           0
> >       diaporthe-stem-canker                downy-mildew
> >                           4                           8
> >          frog-eye-leaf-spot            herbicide-injury
> >                          17                           0
> >      phyllosticta-leaf-spot            phytophthora-rot
> >                           3                           5
> >              powdery-mildew           purple-seed-stain
> >                           4                           5
> >        rhizoctonia-root-rot
> >                           5
> >
> >Cheers,
> >Andy
> >
> > > -----Original Message-----
> > > From: Yves Brostaux [mailto:brostaux.y at fsagx.ac.be]
> > > Sent: Wednesday, April 02, 2003 4:46 AM
> > > To: r-help at stat.math.ethz.ch
> > > Subject: [R] randomForests predict problem
> > >
> > >
> > > Hello everybody,
> > >
> > > I'm testing the randomForest package in order to do some
> > > simulations and I
> > > get some trouble with the prediction of new values. The 
> random forest
> > > computation is fine but each time I try to predict values
> > > with the newly
> > > created object, I get an error message. I thought I was
> > > because NA values
> > > in the dataframe, but I cleaned them and still got the same
> > > error. What am
> > > I doing wrong ?
> > >
> > >  > library(mlbench)
> > >  > library(randomForest)
> > >  > data(Soybean)
> > >  > test <- sample(1:683, 150, replace=F)
> > >  > sb.rf <- randomForest(Class~., data=Soybean[-test,])
> > >  > sb.rf.pred <- predict(sb.rf, Soybean[test,])
> > > Error in matrix(t1$countts, nr = nclass, nc = ntest) :
> > >          No data to replace in matrix(...)
> > >
> > > I did it the same way with rpart and all worked fine :
> > >  > library(rpart)
> > >  > sb.rp <- rpart(Class~., data=Soybean[-test,])
> > >  > sb.rp.pred <- predict(sb.rp, Soybean[test,], type="class")
> > >
> > > Thank you all for any advice you can give to me.
> > >
> > > --
> > > Ir. Yves Brostaux - Statistics and Computer Science Dpt.
> > > Gembloux Agricultural University
> > > 8, avenue de la Faculté B-5030 Gembloux (Belgium)
> > > Tél : +32 (0)81 62 24 69
> > > E-mail : brostaux.y at fsagx.ac.be
> > > Web : http://www.fsagx.ac.be/si/
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> > >
> >
> >-------------------------------------------------------------
> -----------------
> >Notice: This e-mail message, together with any attachments, contains 
> >information of Merck & Co., Inc. (Whitehouse Station, New 
> Jersey, USA) 
> >that may be confidential, proprietary copyrighted and/or legally 
> >privileged, and is intended solely for the use of the 
> individual or entity 
> >named on this message.  If you are not the intended 
> recipient, and have 
> >received this message in error, please immediately return 
> this by e-mail 
> >and then delete it.
> >
> >=============================================================
> =================
> 
> 


------------------------------------------------------------------------------



More information about the R-help mailing list