[R] Training and testing on Unbalanced Data Set

Vijay goel bgoelv at gmail.com
Fri Jul 4 20:55:27 CEST 2014

I used SMOTE algorithm in R for class balancing. My data size has
13000 rows, I had 7% minority class in my sample now I used SMOTE(
Synthetic Minority Oversampling Technique) for class balancing such
that I raised the ration of minority class to 42 % and number of rows
in data sample becomes 12655, Now I need to fit a logistic regression
on my data set for that I need to divide the sample for cross
validation and testing. I tried two approach :

a.) train my data on sample obtained after SMOTE and tested on the
original sample having 13000 rows.

b.) divide the sample obtained after SMOTE into train and test and do
the fitting and testing on this data set only

In first approach my results might get skewed so which approach should
I take and Why ?
Vijay Goel

More information about the R-help mailing list