[R] Random Forest with highly imbalanced data

Kel lamkelj at yahoo.com
Wed May 12 20:38:19 CEST 2004


Hi group,

I am trying to do a RF with approx 250,000
cases.  My objective is to determine the risk factors
of a person being readmitted to hospital (response=1)
or else (response=0).  Only 10%, or 25,000 cases were
readmitted.  I've heard about down-sampling and class
weight approach and am wondering if R can do it.  Even
some reference to articles will help.  

>From the statistical point of view, is there any rule
of thumb of the positive/negative response ratio so
that adjustment has to be applied?

Thank you so much.  

Regards,
Kelvin




More information about the R-help mailing list