[R] How to use a saved SVM model from e1071

Wed Mar 14 16:22:39 CET 2012

Hi Thomas,

On Wed, Mar 14, 2012 at 11:00 AM, Thomas Coffin
<thomas.coffin at artelys.com> wrote:
> Hello,
>
> I have an SVM model previously calibrated using libsvm R implementation from
> the e1071 package.
> I would like to use this SVM to predict values, from a Java program.
> I first tried to use jlibsvm and the "standard" java implementation of
> libsvm, without success.
> Thus, I am now considering writing data in files from my Java code, calling
> an R program to predict values, then gather the predicted values in Java.
>
> The problem is that I do not know how to re-use the model saved using the
> write.svm() function, since there is no read.svm() function.
> I read the following message in the mailing archive, stating that I may use
> the save and load methods built in R :
> http://www.mail-archive.com/r-help@r-project.org/msg64428.html
>
> Still, I am not sure how to pre-process the data and to post-process the
> results.
> Since write.svm() writes .scale and .ysale files as well as an svm file, I
> figure that the scaling data is included in the svm object.
>
> Does that mean that I do not have to worry about scaling my data and
> unscaling the results provided by the predict function on a model reloaded
> using save/load ?
> I am asking this because I previously succeeded in loading the svm model
> from libsvm in Java, but the results using unscaled data were obviously
> wrong.

I think you'll find it helpful (and enlightening) to peruse the source
code of the svm stuffs in e1071.

Start with the `svm.default` function: you'll see where the scale
attributes are calculated (and stored) in the returned object.

Then take a peak at the `predict.svm` function. You'll find if and
when any scaling is performed on the `newdata` object you are trying
to predict labels on.

In short, things should "just work" if you save/load the svm object
you've previously learned as long as your `newdata` object maintains
the same structure your training data had (number and type of features
(and column names if you're using the formula interface, I guess)).

HTH,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact