[R] SVM classification based on pairwise distance matrix

Steve Lianoglou mailinglist.honeypot at gmail.com
Thu Oct 21 17:42:51 CEST 2010


Hi,

On Thu, Oct 21, 2010 at 9:42 AM, Martin Tomko <martin.tomko at geo.uzh.ch> wrote:
> Dear all,
> I am exploring the possibilities for automated classification of my
> data. I have successfully used KNN, but was thinking about looking at
> SVM (which I did nto use before).
> I have a pairwise distance matrix of training observations which are
> classified in set classes, and a distance matrix of new observations to
> the  training ones.

It seems to me that since you have some pairwise distance metric, your
original data is in some "vector form".

Why not just try using your original data (forget the pairwsise
distance for now) and try a few different kernels for the svm, such as
a linear kernel or an rbf/gaussian.

> Is it possible to use distance matrices for SVM, and if yes, which
> package would do so (e1071 ? ).

I guess you can think of a "kernel matrix" as something like a
distance matrix -- actually, it's more like a similarity matrix.

I don't recall if e1071 allows you to use kernel matrix as input, but
I'm pretty sure the svm functions from kernlab do. It was a pain to
use, though.

But anyway -- don't use your distance matrix :-)

> I have little experience with SVM, and I had the impression that it is
> a/ usually used with data taht have observations in terms of a number of
> variables (hence, not pariwise distances);

With the exception of "plugging in" a kernel matrix (which was
calculated from data in its original feature space) that's pretty much
correct.

> b/ it is not well suited for large multidimensional spaces (I have a
> distance matrix of 200*200 observations, a part of this could be used as
> training data, but still, we are looking at say 50 distances per
> observation).

But your distance matrix isn't really the same multidemensional space
your data lives in, right?

Anyway, like I said before, try the SVM on your original data with
some different kernels. I think the RBF kernel should be closest in
spirit to your distance matrix, and will likely perform better than
your kNN ;-).

Hope that helps,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list