[R] help with RandomForest classwt option

Weiwei Shi helprhelp at gmail.com
Tue Jan 30 01:47:33 CET 2007


Hi, Betty:

1. Fortan code (http://www.stat.berkeley.edu/~breiman/RandomForests/cc_examples/prog.f)

	if(jclasswt.eq.0) then
		do j=1,nclass
			classwt(j)=1
		enddo
	endif
	if(jclasswt.eq.1) then
c		fill in classwt(j) for each j:
c		classwt(1)=1.
c		classwt(2)=10.

You need to set the jclasswt = 1 ( you can find by "search" through the codes).
then "uncomment" the last two lines. Here you go with classwt in
fortran. You can use this classwt for extremely-imbalanced
classification problem. Down-sampling is one possible choice for that
too but it is not directly implemented in rf. Check the following
paper, and it might help.
http://oz.berkeley.edu/users/chenchao/666.pdf

2. as to the wrapper function, the idea is that you can create a set
of samples by applying some sampling probilities to implement
down-sampling. Then build a rf model for each sample;
suppose you call rf in this way for each sample,
my.rf <- randomForest(...)

then you can access the oob scores and prediction scores by
my.rf$votes or my.rf$test$votes respectively.

then you can average those scores by yourself, it is just like a
simple meta-learning process but it does exactly what downsampling
plus rf does, though downsampling is not implemented.


3. classwt and cutoff are used at different places. The former is used
at two places: calculating the gini criteria and calculating the final
vote from the leaf. While cutoff is only used in the final voting. So
cutoff won't change the splitting while classwt can. However, since
the current R's rf cannot do classwt, you can try to use cutoff to see
if it helps in your case.

4. The fourth option is you can use my implementation of rf; But I did
not write a manual for that; and it cannot show your splitting yet.

HTH,

weiwei




On 1/29/07, Betty Health <betty.health at gmail.com> wrote:
> Thank you very much, Weiwei and Jim!
>
> Yeah, I did read the post by Andy, the contributor of this package. It seems
> that classwt is not implemented yet. For Weiwei's options, I have a few more
> questions. Thanks!
>
> "1. try to use rf in fortran by following the linky below
> http://www.stat.berkeley.edu/~breiman/RandomForests/cc_software.htm"
>
> I read the Fortran code briefly. But I did not find the options for down
> sampling. So does that mean I need to do down sampling myself?  Could you
> explain a little more about "2. make a wrapper function to do the down
> sampling by yourself"? You mean I can do it in R or in Fortran? Some links
> plz? I haven't done this before.
>
> Yeah, cut off did change for the final classification results. However from
> what I tried, they did not influence how the nodes are split. So I would go
> further in the above 2 options.
>
> Thank you again!
>
>  Betty
>
>
>
>
> On 1/28/07, Weiwei Shi <helprhelp at gmail.com> wrote:
> > Dear Betty:
> >
> > I could suggest 3 options:
> >
> > 1. try to use rf in fortran by following the linky below
> >
> http://www.stat.berkeley.edu/~breiman/RandomForests/cc_software.htm
> >
> > 2. make a wrapper function to do the down sampling by yourself
> >
> > 3. try to use cutoff in randomForest, which might help in your situation.
> >
> > HTH,
> >
> > weiwei
> >
> > On 1/28/07, Betty Health < betty.health at gmail.com> wrote:
> > > Hello there,
> > >
> > > I am working on an extremely unbalanced two class classification
> problems. I
> > > wanna use "classwt" with "down sampling" together. By checking the
> rfNews()
> > > in R, it looks that classwt is not working yet. Then I looked at the
> > > software from Salford. I did not find the down sampling option.  I am
> > > wondering if you have any experience to deal with this problem. Do you
> know
> > > any method or softwares can handle this problem?
> > >
> > > Thank you very much!!
> > >
> > > Betty
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> > --
> > Weiwei Shi, Ph.D
> > Research Scientist
> > GeneGO, Inc.
> >
> > "Did you always know?"
> > "No, I did not. But I believed..."
> > ---Matrix III
> >
>
>


-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III



More information about the R-help mailing list