[R] Error with text analysis data

Wed Apr 13 12:17:47 CEST 2022

Are you sure that read.csv is reading the data correctly? Look at the data frame and compare that result to your interpretation of the data file. Consider converting strings to dates. The lubridate() package is one option.
Tim

-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Jim Lemon
Sent: Wednesday, April 13, 2022 5:25 AM
To: Neha gupta <neha.bologna90 using gmail.com>
Cc: r-help mailing list <r-help using r-project.org>
Subject: Re: [R] Error with text analysis data

[External Email]

Hi Neha,
The suggestion I made was to try stringsAsFactors=TRUE, although I will be surprised if it solves your problem.
CSV means "Comma Separated Variables". The following examples are valid CSV formats:

Date,Temperature,Humidity
13/04/2022,18,87

Country,PrimeMinister,Party
Australia,Morrison,Liberal

You could read in the second example as character OR factor type, depending upon the setting of stringsAsFactors=

Jim

On Wed, Apr 13, 2022 at 7:05 PM Neha gupta <neha.bologna90 using gmail.com> wrote:
>
> Thank you Jim
>
> So what solution you do suggest? The features are text so it doesn't look like a csv format.
>
> Best regards
>
> On Wednesday, April 13, 2022, Jim Lemon <drjimlemon using gmail.com> wrote:
>>
>> Hi Neha,
>> The error message is about not having _factors_ with two or more 
>> levels. Apart from using stringsAsFactors=FALSE (meaning that you 
>> probably won't get any factors in "d"), your sample data doesn't look 
>> like CSV format. Perhaps the lines have been truncated. You may get 
>> something with stringsAsFactors=TRUE, but I don't know whether it 
>> will be sensibler.
>>
>> Jim
>>
>> On Wed, Apr 13, 2022 at 8:12 AM Neha gupta <neha.bologna90 using gmail.com> wrote:
>> >
>> > Hello everyone, I have text data with output variable have three subgroups.
>> > I am using the following code but getting the error message (see 
>> > error after the code).
>> >
>> > d=read.csv("SONAR_RULES.csv", stringsAsFactors = FALSE) 
>> > d$REMEDIATION_FUNCTION=NULL d$DEF_REMEDIATION_GAP_MULT=NULL 
>> > d$REMEDIATION_BASE_EFFORT=NULL
>> >
>> > index <- createDataPartition(d$TYPE, p = .70,list = FALSE) tr <- 
>> > d[index, ] ts <- d[-index, ]
>> >
>> > ctrl <- trainControl(method = "cv",number=3, index = index, 
>> > classProbs = TRUE, summaryFunction = multiClassSummary)
>> >
>> > ran <- train(TYPE ~ ., data = tr,
>> >                     method = "rpart",
>> >                     ## Will create 48 parameter combinations
>> >                     tuneLength = 3,
>> >                     na.action= na.pass,
>> >                     metric = "Accuracy",
>> >                     preProc = c("center", "scale", "nzv"),
>> >                     trControl = ctrl)
>> > getTrainPerf(ran)
>> >
>> > *It gives me error:*
>> >
>> >
>> > *Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>> > contrasts can be applied only to factors with 2 or more levels*
>> >
>> >
>> > *My data is as follow*
>> >
>> > Rows: 1,819
>> > Columns: 14
>> > $ PLUGIN_RULE_KEY             <chr> "InsufficientBranchCoverage",
>> > "InsufficientLin~
>> > $ PLUGIN_CONFIG_KEY           <chr> "", "", "", "", "", "", "", "", "", "",
>> > "S1120~
>> > $ PLUGIN_NAME                 <chr> "common-java", "common-java",
>> > "common-java", "~
>> > $ DESCRIPTION                 <chr> "An issue is created on a file as soon
>> > as the ~
>> > $ SEVERITY                    <chr> "MAJOR", "MAJOR", "MAJOR", "MAJOR",
>> > "MAJOR", "~
>> > $ NAME                        <chr> "Branches should have sufficient
>> > coverage by t~
>> > $ DEF_REMEDIATION_FUNCTION    <chr> "LINEAR", "LINEAR", "LINEAR",
>> > "LINEAR_OFFSET",~
>> > $ REMEDIATION_GAP_MULT        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> > NA, NA~
>> > $ DEF_REMEDIATION_BASE_EFFORT <chr> "", "", "", "10min", "", "", 
>> > "5min", "5min", "~
>> > $ GAP_DESCRIPTION             <chr> "number of uncovered conditions",
>> > "number of l~
>> > $ SYSTEM_TAGS                 <chr> "bad-practice", "bad-practice",
>> > "convention", ~
>> > $ IS_TEMPLATE                 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> > 0, 0, 0~
>> > $ DESCRIPTION_FORMAT          <chr> "HTML", "HTML", "HTML", "HTML", "HTML",
>> > "HTML"~
>> > $ TYPE                        <chr> "CODE_SMELL", "CODE_SMELL",
>> > "CODE_SMELL", "COD~
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_m
>> > ailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh
>> > 2kVeAsRzsn7AkP-g&m=PARBI_qb2A9OpQHRgGxbaoo-hD4kcHlvR9jgfs5WYBuVu6gS
>> > gBlY7mvZWK3FZGew&s=PmVgSi0KaMcXns1Wmu0-K9kUqo7A7nSGLT8oCtaNgGE&e=
>> > PLEASE do read the posting guide 
>> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject
>> > .org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQ
>> > h2kVeAsRzsn7AkP-g&m=PARBI_qb2A9OpQHRgGxbaoo-hD4kcHlvR9jgfs5WYBuVu6g
>> > SgBlY7mvZWK3FZGew&s=h56R6zMkpRuIlkEwv9G878zFxfilQOleZYC4dJ4fHps&e=
>> > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=PARBI_qb2A9OpQHRgGxbaoo-hD4kcHlvR9jgfs5WYBuVu6gSgBlY7mvZWK3FZGew&s=PmVgSi0KaMcXns1Wmu0-K9kUqo7A7nSGLT8oCtaNgGE&e=
PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=PARBI_qb2A9OpQHRgGxbaoo-hD4kcHlvR9jgfs5WYBuVu6gSgBlY7mvZWK3FZGew&s=h56R6zMkpRuIlkEwv9G878zFxfilQOleZYC4dJ4fHps&e=
and provide commented, minimal, self-contained, reproducible code.