[R] NAs error in caret function

javed khan j@vedbtk111 @end|ng |rom gm@||@com
Thu Apr 21 15:15:42 CEST 2022


Dear Carlos Ortega, thank you again for your information.

My data is indeed imbalanced, so I am going to balance it and will try
again

Best regards

On Thursday, April 21, 2022, Carlos Ortega <coforfe using gmail.com> wrote:

> Hi,
>
> I do not see any issue with the code you provided.
> In this situation, you should use a more "debugging" approach for your
> problem until catching the problem. In this case, I would start using a
> much more simplified version of your "trainControl". No folds, just "cv"
> and "number = 2" and try.
>
> Perhaps the problem is that you do not have enough or any representation
> of one of your labels and that creates an evaluation problem. If your data
> is not balanced and you create a lot of folds that could happen.
>
> And if it works with this very simplified version, start including more
> complexity in the trainControl function.
>
> Thanks,
> Carlos.
>
>
> On Thu, Apr 21, 2022 at 12:59 AM javed khan <javedbtk111 using gmail.com> wrote:
>
>> Carlos Ortega, thank you for your answer.
>>
>> Class label has three values (Bug, Codel smell and Vulnerability). X is a
>> text-based feature that include English statements and we performed some
>> preprocessing such as removing symbols, lower-case etc.
>>
>> Yes, train_label is a factor class.
>>
>> *I can provide the whole code and data if needed. We followed the same
>> method provided in this tutorial*
>>
>> *https://algotech.netlify.app/blog/text-lime/
>> <https://algotech.netlify.app/blog/text-lime/> *
>>
>>
>> cv.folds <- createMultiFolds(train$TYPE, k = 10, times = 3)
>>
>> ctrl <- trainControl(method = "cv",number=3, index = cv.folds, classProbs
>> = TRUE, summaryFunction = multiClassSummary)
>> m= train(y = train_label, x = train_x,
>>       method = "knn" ,
>>       metric = "Accuracy",
>>       ## #  preProc = c("center", "scale", "nzv"),
>>       trControl = ctrl)
>>
>> p=predict(m, test_x)
>> confusionMatrix(p, as.factor(test_label))
>>
>> With some models, it show error like: Error in { :
>>   task 1 failed - "Not all variable names used in object found in newdata"
>>
>> However, when I run the base models like naiveBayes, it works.
>>
>> model_bayes <- naiveBayes(train_x, train_label, laplace = 1)
>>
>>
>> On Wed, Apr 20, 2022 at 11:09 PM Carlos Ortega <coforfe using gmail.com> wrote:
>>
>>> Hi,
>>>
>>> There are many things than could be wrong:
>>>
>>> 1. What is inside "ctrl" in the trainControl argument ?
>>> 2. Your model is a classication one, but if you do not configure
>>> correctly "ctrl" you do not get out the metrics correctly. It depends if
>>> your model is binary or multi-class.
>>> 3. Another thing is that if it is a classification one, you should also
>>> check that in the "train()" you "train_label" is a factor.
>>>
>>> On top of that, remember that your problem is not reproducible.
>>> If you attach a portion of your data, we could create a working "caret"
>>> code.
>>>
>>> Thanks,
>>> Carlos Ortega.
>>>
>>> On Wed, Apr 20, 2022 at 10:26 PM Bert Gunter <bgunter.4567 using gmail.com>
>>> wrote:
>>>
>>>> A quick web search on 'R caret package' found a host of useful
>>>> results, the first of which was this:
>>>> https://topepo.github.io/caret/
>>>> Note that the author, Max Kuhn, explicitly says there that you can
>>>> email him with questions. I think you should do so, as you do not seem
>>>> to be making progress here.
>>>>
>>>> Bert Gunter
>>>>
>>>> "The trouble with having an open mind is that people keep coming along
>>>> and sticking things into it."
>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>
>>>> On Wed, Apr 20, 2022 at 12:51 PM javed khan <javedbtk111 using gmail.com>
>>>> wrote:
>>>> >
>>>> > Caret produce the error: Something is wrong; all the Accuracy metric
>>>> values
>>>> > are missing:
>>>> >     logLoss         AUC          prAUC        Accuracy       Kappa
>>>> >  Min.   : NA   Min.   : NA   Min.   : NA   Min.   : NA   Min.   : NA
>>>> >  1st Qu.: NA   1st Qu.: NA   1st Qu.: NA   1st Qu.: NA   1st Qu.: NA
>>>> >  Median : NA   Median : NA   Median : NA   Median : NA   Median : NA
>>>> >
>>>> > We (group of three) working on an assignment and could not fix this
>>>> error
>>>> > from a few days. The error comes with the majority of the models
>>>> while with
>>>> > a few model (e.g. nb), the code works. The data is text-based
>>>> > classification.
>>>> >
>>>> > Some Warnings are:
>>>> >
>>>> > Warning messages:
>>>> > 1: In train.default(y = train_label, x = train_x, method = "pls",
>>>> ... :
>>>> >   The metric "ROC" was not in the result set. logLoss will be used
>>>> instead.
>>>> > 2: model fit failed for Fold01.Rep1: ncomp=3 Error in
>>>> > `[[<-.data.frame`(`*tmp*`, i, value = structure(c(1L, 1L, 1L,  :
>>>> >   replacement has 320292 rows, data has 1148
>>>> >
>>>> > 3: model fit failed for Fold02.Rep1: ncomp=3 Error in
>>>> > `[[<-.data.frame`(`*tmp*`, i, value = structure(c(1L, 1L, 1L,  :
>>>> >   replacement has 320013 rows, data has 1147
>>>> >
>>>> > 4: model fit failed for Fold03.Rep1: ncomp=3 Error in
>>>> > `[[<-.data.frame`(`*tmp*`, i, value = structure(c(1L, 1L, 1L,  :
>>>> >   replacement has 320013 rows, data has 1147
>>>> >
>>>> > 5: model fit failed for Fold04.Rep1: ncomp=3 Error in
>>>> > `[[<-.data.frame`(`*tmp*`, i, value = structure(c(1L, 1L, 1L,  :
>>>> >   replacement has 320292 rows, data has 1148
>>>> >
>>>> > 6: model fit failed for Fold05.Rep1: ncomp=3 Error in
>>>> > `[[<-.data.frame`(`*tmp*`, i, value = structure(c(1L, 1L, 1L,  :
>>>> >   replacement has 320013 rows, data has 1147
>>>> >
>>>> > 7: model fit failed for Fold06.Rep1: ncomp=3 Error in
>>>> > `[[<-.data.frame`(`*tmp*`, i, value = structure(c(1L, 1L, 1L,  :
>>>> >   replacement has 320013 rows, data has 1147
>>>> >
>>>> >
>>>> >
>>>> > Code is
>>>> >
>>>> >
>>>> > m= train(y = train_label, x = train_x,
>>>> >       method = "pls" ,
>>>> >       metric = "Accuracy",
>>>> >       ## #  preProc = c("center", "scale", "nzv"),
>>>> >       trControl = ctrl)
>>>> >
>>>> > p=predict(m, test_x)
>>>> > confusionMatrix(p, as.factor(test_label))
>>>> >
>>>> >         [[alternative HTML version deleted]]
>>>> >
>>>> > ______________________________________________
>>>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>>> > PLEASE do read the posting guide http://www.R-project.org/
>>>> posting-guide.html
>>>> > and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/
>>>> posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>

	[[alternative HTML version deleted]]



More information about the R-help mailing list