[R] SVM probability output variation

Wed Oct 21 21:58:37 CEST 2009

Hi,

> <snip>
> > If I instead output the decision values, the whole procedure is
> > fully reproducible, i.e. the exact same values are returned when I
> > retrain the model.
> 
> By the decision values, you mean the predict labels, right?

The output of decision values can be turned on in the predict.svm, and is, as I have understood, the distance from the data point to the hyperplane. (I should say that my knowledge here is limited to concepts, I know nothing about the details in which this works...). I use these to create ROC curves etc.
> 
> > I have no idea how the probabilities are calculated, but it seems to
> > be in this step that the differences arise. In my case, I feel a bit
> > hesitant to use them when they differ that much between runs (15% or
> > so)...
> 
> I'd find that a bit disconcerting, too. Can you give a sample of your
> data + code your using that can reproduce this example?
> 

I have the data at the office, so I can't do that now (at home).

> Warning: Brainstorming Below
> 
> If I were to calculate probabilities for my class labels, I'd make the
> probability some function of the example's distance from the decision
> boundary.
> 
> Now, if your decision boundary isn't changing from run to run (and I
> guess it really shouldn't be, since the SVM returns the maximum margin
> classifier (which is, by definition, unique, right?)), it's hard to
> imagine why these probabilities would change, either ...
> 
> ... unless you're holding out different subsets of your data during
> training, or perhaps have a different value for your penalty (cost)
> parameter when building the model. I believe you said that you're
> actually training the same exact model each time, though, right?

Yes, I'm using the exact same data to train each time. I thought this would generate identical models, but that doesn't appear to be the case.

> 
> Anyway, I see the help page for ?svm says this, if it helps:
> 
> "The probability model for classification fits a logistic distribution
> using maximum likelihood to the decision values of all binary
> classifiers, and computes the a-posteriori class probabilities for the
> multi-class problem using quadratic optimization"

This is where I realise I'm in a bit over my head on the theroy side - this means nothing to me...
> 
> -steve

Thanks again,
Anders