[R] Stepwise SVM Variable selection

Noah Silverman noah at smartmediacorp.com
Fri Jan 7 08:10:59 CET 2011


I have a data set with about 30,000 training cases and 103 variable.

I've trained an SVM (using the e1071 package) for a binary classifier 
{0,1}.  The accuracy isn't great.

I used a grid search over the C and G parameters with an RBF kernel to 
find the best settings.

I remember that for least squares, R has a nice stepwise function that 
will try combining subsets of variables to find the optimal result.  
Clearly, this doesn't exist for SVMs as a built in function.

As an experiment, I simply grabbed the first 50 variables and repeated 
the training/grid search procedure.  The results were significantly 
better.  Since the date is VERY noisy, my guess is that eliminating some 
of the variables eliminated some noise that resulted in better results.

With a grid of 100 parameter settings (10 for C, 10 for G) and 106 
variables, trying every combination would be prohibitively time consuming.

Can anyone suggest an approach to seek the ideal subset of variables for 
my SVM classifier?

Thanks!



More information about the R-help mailing list