[R] oversampling code

Weidong Gu anopheles123 at gmail.com
Mon Oct 31 19:54:51 CET 2011


For a data set dat with variable 'case', it follows

sam.rate=0.9
n.ctrl<-nrow(dat[dat$case==0,])
sam.ctrl<-dat[sample(row.names(dat[dat$case==0],n.ctrl*sam.rate,replace=F),]
rbind(dat[dat$case==1,],sam.ctrl)

Weidong Gu

On Mon, Oct 31, 2011 at 1:54 PM, loubna ibn majdoub hassani
<loubn181 at gmail.com> wrote:
> Hi
> I have an umbalanced data set where I want to predict a binary variable Y.
> I want to do an under sampling by keeping all the 1 and taking just some of
> the 0 such as I'll have 90% of 0 and 10% of 1.
> Can u help me do that
> Thank u
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list