[R] Questions bout SVM

Steve Lianoglou mailinglist.honeypot at gmail.com
Mon Jan 4 01:56:16 CET 2010


2010/1/2 Nancy Adam <nancyadam84 at hotmail.com>:
> Hi Steve,
>
> Thanks a lot for your reply.
>
> 1)I’m still confused which equation (1- sqrt(mean(mymodel$MSE)) OR 2-
> mean(sqrt(mymodel$MSE)) )is equivalent to sqrt(mean(error**2))?

So, as I mentioned before, mymodel$MSE is a vector that's as long as
the number of folds your are using for cross validation. If you're
setting cross=10, $MSE will have 10 values in it. Each value is the
*mean squared error* for each fold (as described in the ?svm
documentation under the `cross` parameter).

If you do 1: sqrt(mean(mymodel$MSE)), then you're taking the square
root of the averaged mean squared error.

If you do 2: mean(sqrt(mymodel$MSE)), you are taking the average of
the square root of the MSE from each fold.

> I just want to compute the typical RMSE that is usually used for measuring
> the performance of regression systems.

It sounds like you want to do 2.

> 2)I’m talking about another addition related to the svm parameters in the
> call to SVM. i.e.
>
> my_svm_model<- function(myformula, mydata, mytestdata, parameterlist) {
>
> mymodel <- svm(myformula, data=mydata, cross=10, cost=parameterlist[[1]],
> epsilon=parameterlist[[2]],gamma=parameterlist[[3]])
>
> If I don’t set these parameters of svm (like: my_svm_model<-
> function(myformula, mydata, mytestdata), how does svm know them?

Functions can define default values for their arguments. So if you
don't define their values when you call the function, they will take
their defaults. For example, if you don't explicitly set things like
the `cost` (for c-classification), or `epsilon` for regression, etc.
it will take the default values.

You can see the default value for these params in the documentation for ?svm

> 3) in 2) Is it correct to use “mydata” instead of “data=mydata”? Or I can do
> that only if it is the “last” argument in the function call?

It's not because it's the last argument, but because it's the second
argument. `data` is defined as the second argument of the `svm`
function (when used with a formula), and you are passing it as the 2nd
argument when you call the function.

> 4)Does mytestdata[,1] means that the model will use only the last column on
> the testing set?

mytestdata[,1] means you are taking the first column of the mytestdata
matrix and treating it as a vector and ignoring the rest of the matrix
...

>From some previous correspondence, and questions 2 and 3 from here,
honestly I'd suggest investing some time in brushing up on R basics.
Reading the R intro is as good a place to start as any:

http://cran.r-project.org/doc/manuals/R-intro.html

There are several sections on indexing vectors, matrices, etc. Section
10 of that document also talks a bit about named, positional, and
default arguments ...

HTH,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list