[R] SVM classification based on pairwise distance matrix

Martin Tomko martin.tomko at geo.uzh.ch
Thu Oct 21 18:12:35 CEST 2010


Hi Steve,
tahnks for the hints and clarifications.
Unfortunately, I will not be able to use the approach you suggest, The 
distances I generate are distances between VERY large matrices (say 
100000x100000 and more) each  of different dimensions (not necessarily 
square either), and there is no significance in terms of column 
properties, they are basically graphs of sort.

Is there a way out with the SVM, or I just forget that?
Martin

On 10/21/2010 5:42 PM, Steve Lianoglou wrote:
> Hi,
>
> On Thu, Oct 21, 2010 at 9:42 AM, Martin Tomko<martin.tomko at geo.uzh.ch>  wrote:
>    
>> Dear all,
>> I am exploring the possibilities for automated classification of my
>> data. I have successfully used KNN, but was thinking about looking at
>> SVM (which I did nto use before).
>> I have a pairwise distance matrix of training observations which are
>> classified in set classes, and a distance matrix of new observations to
>> the  training ones.
>>      
> It seems to me that since you have some pairwise distance metric, your
> original data is in some "vector form".
>
> Why not just try using your original data (forget the pairwsise
> distance for now) and try a few different kernels for the svm, such as
> a linear kernel or an rbf/gaussian.
>
>    
>> Is it possible to use distance matrices for SVM, and if yes, which
>> package would do so (e1071 ? ).
>>      
> I guess you can think of a "kernel matrix" as something like a
> distance matrix -- actually, it's more like a similarity matrix.
>
> I don't recall if e1071 allows you to use kernel matrix as input, but
> I'm pretty sure the svm functions from kernlab do. It was a pain to
> use, though.
>
> But anyway -- don't use your distance matrix :-)
>
>    
>> I have little experience with SVM, and I had the impression that it is
>> a/ usually used with data taht have observations in terms of a number of
>> variables (hence, not pariwise distances);
>>      
> With the exception of "plugging in" a kernel matrix (which was
> calculated from data in its original feature space) that's pretty much
> correct.
>
>    
>> b/ it is not well suited for large multidimensional spaces (I have a
>> distance matrix of 200*200 observations, a part of this could be used as
>> training data, but still, we are looking at say 50 distances per
>> observation).
>>      
> But your distance matrix isn't really the same multidemensional space
> your data lives in, right?
>
> Anyway, like I said before, try the SVM on your original data with
> some different kernels. I think the RBF kernel should be closest in
> spirit to your distance matrix, and will likely perform better than
> your kNN ;-).
>
> Hope that helps,
> -steve
>
>



More information about the R-help mailing list