[R] Error with text analysis data

Wed Apr 13 05:12:43 CEST 2022

Hi Neha,
The error message is about not having _factors_ with two or more
levels. Apart from using stringsAsFactors=FALSE (meaning that you
probably won't get any factors in "d"), your sample data doesn't look
like CSV format. Perhaps the lines have been truncated. You may get
something with stringsAsFactors=TRUE, but I don't know whether it will
be sensibler.

Jim

On Wed, Apr 13, 2022 at 8:12 AM Neha gupta <neha.bologna90 using gmail.com> wrote:
>
> Hello everyone, I have text data with output variable have three subgroups.
> I am using the following code but getting the error message (see error
> after the code).
>
> d=read.csv("SONAR_RULES.csv", stringsAsFactors = FALSE)
> d$REMEDIATION_FUNCTION=NULL
> d$DEF_REMEDIATION_GAP_MULT=NULL
> d$REMEDIATION_BASE_EFFORT=NULL
>
> index <- createDataPartition(d$TYPE, p = .70,list = FALSE)
> tr <- d[index, ]
> ts <- d[-index, ]
>
> ctrl <- trainControl(method = "cv",number=3, index = index, classProbs =
> TRUE, summaryFunction = multiClassSummary)
>
> ran <- train(TYPE ~ ., data = tr,
>                     method = "rpart",
>                     ## Will create 48 parameter combinations
>                     tuneLength = 3,
>                     na.action= na.pass,
>                     metric = "Accuracy",
>                     preProc = c("center", "scale", "nzv"),
>                     trControl = ctrl)
> getTrainPerf(ran)
>
> *It gives me error:*
>
>
> *Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
> contrasts can be applied only to factors with 2 or more levels*
>
>
> *My data is as follow*
>
> Rows: 1,819
> Columns: 14
> $ PLUGIN_RULE_KEY             <chr> "InsufficientBranchCoverage",
> "InsufficientLin~
> $ PLUGIN_CONFIG_KEY           <chr> "", "", "", "", "", "", "", "", "", "",
> "S1120~
> $ PLUGIN_NAME                 <chr> "common-java", "common-java",
> "common-java", "~
> $ DESCRIPTION                 <chr> "An issue is created on a file as soon
> as the ~
> $ SEVERITY                    <chr> "MAJOR", "MAJOR", "MAJOR", "MAJOR",
> "MAJOR", "~
> $ NAME                        <chr> "Branches should have sufficient
> coverage by t~
> $ DEF_REMEDIATION_FUNCTION    <chr> "LINEAR", "LINEAR", "LINEAR",
> "LINEAR_OFFSET",~
> $ REMEDIATION_GAP_MULT        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> NA, NA~
> $ DEF_REMEDIATION_BASE_EFFORT <chr> "", "", "", "10min", "", "", "5min",
> "5min", "~
> $ GAP_DESCRIPTION             <chr> "number of uncovered conditions",
> "number of l~
> $ SYSTEM_TAGS                 <chr> "bad-practice", "bad-practice",
> "convention", ~
> $ IS_TEMPLATE                 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0~
> $ DESCRIPTION_FORMAT          <chr> "HTML", "HTML", "HTML", "HTML", "HTML",
> "HTML"~
> $ TYPE                        <chr> "CODE_SMELL", "CODE_SMELL",
> "CODE_SMELL", "COD~
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.