[R] predict function type class vs. prob

David Winsemius dw|n@em|u@ @end|ng |rom comc@@t@net
Sat Sep 23 21:10:57 CEST 2023


On 9/23/23 05:30, Rui Barradas wrote:
> Às 11:12 de 22/09/2023, Milbert, Sabine (LGL) escreveu:
>> Dear R Help Team,
>>
>> My research group and I use R scripts for our multivariate data 
>> screening routines. During routine use, we encountered some 
>> inconsistencies within the predict() function of the R Stats Package.

In addition to Rui's correction to this misstatement, the caret package 
is really a meta package that attempts to implement an umbrella 
framework for a vast array of tools from a wide variety of sources. It 
is an immense effort but not really a part of the core R project. The 
correct place to file issues is found in the DESCRIPTION file:


URL: https://github.com/topepo/caret/ BugReports: 
https://github.com/topepo/caret/issues

  If you use `str` on an object constructed with caret, you discover 
that the `predict` function is actually not in the main workspace but 
rather embedded in the fit-object itself. I think this is a rather 
general statement regarding the caret universe, and so I expect that 
your fit -objects can be examined for the code that predict.train will 
use with this approach. Your description of your analysis methods was 
rather incompletely specified, and I will put an appendix of "svm" 
methods that might be specified after my demonstration using code. (Note 
that I do not see a caret "weights" hyper-parameter for the "svmLinear" 
method which is actually using code from pkg:kernlab.)


library(caret) svmFit <- train(Species ~ ., data = iris, method = 
"svmLinear", trControl = trainControl(method = "cv")) class(svmFit) #[1] 
"train" "train.formula" str(predict(svmFit)) Factor w/ 3 levels 
"setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... str(svmFit) #---screen 
output------------- List of 24 $ method : chr "svmLinear" $ modelInfo 
:List of 13 ..$ label : chr "Support Vector Machines with Linear Kernel" 
..$ library : chr "kernlab" ..$ type : chr [1:2] "Regression" 
"Classification" ..$ parameters:'data.frame': 1 obs. of 3 variables: .. 
..$ parameter: chr "C" .. ..$ class : chr "numeric" .. ..$ label : chr 
"Cost" ..$ grid :function (x, y, len = NULL, search = "grid") ..$ loop : 
NULL ..$ fit :function (x, y, wts, param, lev, last, classProbs, ...) 
..$ predict :function (modelFit, newdata, submodels = NULL) ..$ prob 
:function (modelFit, newdata, submodels = NULL) ..$ predictors:function 
(x, ...) ..$ tags : chr [1:5] "Kernel Method" "Support Vector Machines" 
"Linear Regression" "Linear Classifier" ... ..$ levels :function (x) ..$ 
sort :function (x) $ modelType : chr "Classification" # ---- large 
amount of screen output omitted------ # note that the class of 
svmFit$modelInfo$predict is 'function' # and its code at least to this 
particular svm method of which there are about 10!
svmFit$modelInfo$predict #---- screen output ------ function (modelFit, 
newdata, submodels = NULL) { svmPred <- function(obj, x) { hasPM <- 
!is.null(unlist(obj using prob.model)) if (hasPM) { pred <- 
kernlab::lev(obj)[apply(kernlab::predict(obj, x, type = 
"probabilities"), 1, which.max)] } else pred <- kernlab::predict(obj, x) 
pred } out <- try(svmPred(modelFit, newdata), silent = TRUE) if 
(is.character(kernlab::lev(modelFit))) { if (class(out)[1] == 
"try-error") { warning("kernlab class prediction calculations failed; 
returning NAs") out <- rep("", nrow(newdata)) out[seq(along = out)] <- 
NA } } else { if (class(out)[1] == "try-error") { warning("kernlab 
prediction calculations failed; returning NAs") out <- rep(NA, 
nrow(newdata)) } } if (is.matrix(out)) out <- out[, 1] out } <bytecode: 
0x561277d4ec50> -- David


>> Through internal research, we were unable to find the reason for this 
>> and have decided to contact your help team with the following issue:
>>
>> The predict() function is used once to predict the class membership 
>> of a new sample (type = "class") on a trained linear SVM model for 
>> distinguishing two classes (using the caret package). It is then used 
>> to also examine the probability of class membership (type = "prob"). 
>> Both are then presented in an R shiny output. Within the routine, we 
>> noticed two samples (out of 100+) where the class prediction and 
>> probability prediction did not match. The prediction probabilities of 
>> one class (52%) did not match the class membership within the predict 
>> function. We use the same seed and the discrepancy is reproducible in 
>> this sample. The same problem did not occur in other trained models 
>> (lda, random forest, radial SVM...).

*Support Vector Machines with Boundrange String Kernel*(|method = 
'svmBoundrangeString'|)

For classification and regression using packagekernlabwith tuning 
parameters:

  *

    length (|length|, numeric)

  *

    Cost (|C|, numeric)

*Support Vector Machines with Class Weights*(|method = 'svmRadialWeights'|)

For classification using packagekernlabwith tuning parameters:

  *

    Sigma (|sigma|, numeric)

  *

    Cost (|C|, numeric)

  *

    Weight (|Weight|, numeric)

*Support Vector Machines with Exponential String Kernel*(|method = 
'svmExpoString'|)

For classification and regression using packagekernlabwith tuning 
parameters:

  *

    lambda (|lambda|, numeric)

  *

    Cost (|C|, numeric)

*Support Vector Machines with Linear Kernel*(|method = 'svmLinear'|)

For classification and regression using packagekernlabwith tuning 
parameters:

  *

    Cost (|C|, numeric)

*Support Vector Machines with Linear Kernel*(|method = 'svmLinear2'|)

For classification and regression using packagee1071with tuning parameters:

  *

    Cost (|cost|, numeric)

*Support Vector Machines with Polynomial Kernel*(|method = 'svmPoly'|)

For classification and regression using packagekernlabwith tuning 
parameters:

  *

    Polynomial Degree (|degree|, numeric)

  *

    Scale (|scale|, numeric)

  *

    Cost (|C|, numeric)

*Support Vector Machines with Radial Basis Function Kernel*(|method = 
'svmRadial'|)

For classification and regression using packagekernlabwith tuning 
parameters:

  *

    Sigma (|sigma|, numeric)

  *

    Cost (|C|, numeric)

*Support Vector Machines with Radial Basis Function Kernel*(|method = 
'svmRadialCost'|)

For classification and regression using packagekernlabwith tuning 
parameters:

  *

    Cost (|C|, numeric)

*Support Vector Machines with Radial Basis Function Kernel*(|method = 
'svmRadialSigma'|)

For classification and regression using packagekernlabwith tuning 
parameters:

  *

    Sigma (|sigma|, numeric)

  *

    Cost (|C|, numeric)

Note: This SVM model tunes over the cost parameter and the RBF kernel 
parameter sigma. In the latter case, using|tuneLength|will, at most, 
evaluate six values of the kernel parameter. This enables a broad search 
over the cost parameter and a relatively narrow search over|sigma|

*Support Vector Machines with Spectrum String Kernel*(|method = 
'svmSpectrumString'|)

For classification and regression using packagekernlabwith tuning 
parameters:

  *

    length (|length|, numeric)

  *

    Cost (|C|, numeric)

>>
>> Is there a weighing of classes within the prediction function or is 
>> the classification limit not at 50%/a majority vote? Or do you have 
>> another explanation for this discrepancy, please let us know.
>>
>> PS: If this is an issue based on the model training function of the 
>> caret package and therefore not your responsibility, please let us know.
>>
>> Thank you in advance for your support!
>>
>> Yours sincerely,
>> Sabine Milbert
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> Hello,
>
> I cannot tell what is going on but I would like to make a correction 
> to your post.
>
> predict() is a generic function with methods for objects of several 
> classes in many packages. In base package stats you will find methods 
> for objects (fits) of class lm, glm and others, see ?predict.
>
> The method you are asking about is predict.train, defined in package 
> caret, not in package stats.
> to see what predict method is being called, check
>
>
> class(your_fit)
>
>
> Hope this helps,
>
> Rui Barradas
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
	[[alternative HTML version deleted]]



More information about the R-help mailing list