[R] help with the usage of "randomForest"

Liaw, Andy andy_liaw at merck.com
Wed Mar 31 22:11:04 CEST 2004


As you've learned, using the formula interface, the NAs are handled by
na.action.  (BTW, the R default is na.omit, so the NAs are silently omitted.
If it were na.fail, you would have gotten an error message.)

There are several options on handling NAs, na.omit being one of them.  If
you have too many NAs, omitting them would leave you too little data, as you
experienced.  One possibility is to use na.roughfix (in the randomForest
package) as na.action, which replaces the NAs with the median of the
variable (or the mode for factor variable).  If you want to, you can use
rfImpute to use randomForest itself to impute NAs (assuming your training
data isn't terribly big).

HTH,
Andy

> From: Hui Han
> 
> Thanks for Matt and Torsten for very helpful suggestions!
> As Matt pointed out, the problem is that na.action has the 
> default value of na.fail, that
> deleted one class samples. I changed all NAs to real values, 
> and the error msg.
> dissappeared. 
> 
> However my real dataset contains many NAs. I wonder if 
> anybody can point me any documentations on
> how to define na.action not be na.fail?
> 
> Best regards,
> Hui
> 
> On Wed, Mar 31, 2004 at 06:26:36PM +0200, Torsten Hothorn wrote:
> > On Wed, 31 Mar 2004, Hui Han wrote:
> > 
> > > Dear all,
> > >
> > > Can anybody give me some hint on the following error msg 
> I got with using
> > > randomForest?
> > >
> > > I have two-class classification problem. The data file 
> "sample" is:
> > > ----------------------------------------------------------
> > >  udomain.edu udomain.hcs hpclass
> > > 1 1.0000 1 not
> > > 2 NA 2 not
> > > 3 NA 0.8 not
> > > 4 NA 0.2 hp
> > > 5 NA 0.9 hp
> > > ------------------------------------------------------------
> > > The steps I called the function are:
> > > (1) Read data
> > > hp <- read.table("sample")
> > 
> > most probably a problem here. say
> > 
> > R> summary(hp)
> > 
> > and check if the factor `hpclass' has two levels.
> > 
> > Torsten
> > 
> > > (2) Call randomForest
> > > hp.rf <- randomForest(hpclass ~., yy, data=hp, importance=TRUE,
> > > proximity=TRUE)
> > >
> > > But the error msg I got is:
> > > Error in randomForest.default(m, y, ...) :
> > >         Need at least two classes to do classification.
> > >
> > >
> > > I learned the usage of randomForest from:
> > > 
> http://www.maths.lth.se/help/R/.R/library/randomForest/html/ra
ndomForest.html
> >
> > Thanks a lot for any of your comments in advance!
> >
> >
> > Hui Han
> > Department of Computer Science and Engineering,
> > The Pennsylvania State University
> > University Park, PA,16802
> > email: hhan at cse.psu.edu
> > homepage: http://www.cse.psu.edu/~hhan
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
> >
> >


Hui Han
Department of Computer Science and Engineering,
The Pennsylvania State University 
University Park, PA,16802
email: hhan at cse.psu.edu
homepage: http://www.cse.psu.edu/~hhan

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




More information about the R-help mailing list