[R] Error with text analysis data

Bill Dunlap w||||@mwdun|@p @end|ng |rom gm@||@com
Wed Apr 13 18:57:23 CEST 2022


>  I would always suggest working until the model works, no errors and no
NA values

We agree on that.  However, the error gives you no hint about which
variables are causing the problem.  If it did, then it could only tell
about the first variable with the problem.  I think you would get to your
working model faster if you got NA's for the constant columns and then
could drop them all at once (or otherwise deal with them).

-Bill

On Wed, Apr 13, 2022 at 9:40 AM Ebert,Timothy Aaron <tebert using ufl.edu> wrote:

> I suspect that it is because you are looking at two types of error, both
> telling you that the model was not appropriate. In the “error in contrasts”
> there is nothing to contrast in the model. For a numerical constant the
> program calculates the standard deviation and ends with a division by zero.
> Division by zero is undefined, or NA.
>
>
>
> I would always suggest working until the model works, no errors and no NA
> values. The reason is that I can get NA in several ways and I need to
> understand why. If I just ignore the NA in my model I may be assuming the
> wrong thing.
>
>
>
> Tim
>
>
>
> *From:* Bill Dunlap <williamwdunlap using gmail.com>
> *Sent:* Wednesday, April 13, 2022 12:23 PM
> *To:* Ebert,Timothy Aaron <tebert using ufl.edu>
> *Cc:* Neha gupta <neha.bologna90 using gmail.com>; r-help mailing list <
> r-help using r-project.org>
> *Subject:* Re: [R] Error with text analysis data
>
>
>
> *[External Email]*
>
> Constant columns can be the model when you do some subsetting or are
> exploring a new dataset.  My objection is that constant columns of numbers
> and logicals are fine but those of characters and factors are not.
>
>
>
> -Bill
>
>
>
> On Wed, Apr 13, 2022 at 9:15 AM Ebert,Timothy Aaron <tebert using ufl.edu>
> wrote:
>
> What is the goal of having a constant in the model? To me that seems
> pointless. Also there is no variability in sexCode regardless of whether
> you call it integer or factor. So the model y ~ sexCode is just a strange
> way to look at the variability in y and it would be better to do something
> like summarize(y) or mean(y) if that was the goal.
>
> Tim
>
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Bill Dunlap
> Sent: Wednesday, April 13, 2022 9:56 AM
> To: Neha gupta <neha.bologna90 using gmail.com>
> Cc: r-help mailing list <r-help using r-project.org>
> Subject: Re: [R] Error with text analysis data
>
> [External Email]
>
> This sounds like what I think is a bug in stats::model.matrix.default(): a
> numeric column with all identical entries is fine but a constant character
> or factor column is not.
>
> > d <- data.frame(y=1:5, sex=rep("Female",5)) d$sexFactor <-
> > factor(d$sex, levels=c("Male","Female")) d$sexCode <-
> > as.integer(d$sexFactor) d
>   y    sex sexFactor sexCode
> 1 1 Female    Female       2
> 2 2 Female    Female       2
> 3 3 Female    Female       2
> 4 4 Female    Female       2
> 5 5 Female    Female       2
> > lm(y~sex, data=d)
> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>   contrasts can be applied only to factors with 2 or more levels
> > lm(y~sexFactor, data=d)
> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>   contrasts can be applied only to factors with 2 or more levels
> > lm(y~sexCode, data=d)
>
> Call:
> lm(formula = y ~ sexCode, data = d)
>
> Coefficients:
> (Intercept)      sexCode
>           3           NA
>
> Calling traceback() after the error would clarify this.
>
> -Bill
>
>
> On Tue, Apr 12, 2022 at 3:12 PM Neha gupta <neha.bologna90 using gmail.com>
> wrote:
>
> > Hello everyone, I have text data with output variable have three
> subgroups.
> > I am using the following code but getting the error message (see error
> > after the code).
> >
> > d=read.csv("SONAR_RULES.csv", stringsAsFactors = FALSE)
> > d$REMEDIATION_FUNCTION=NULL d$DEF_REMEDIATION_GAP_MULT=NULL
> > d$REMEDIATION_BASE_EFFORT=NULL
> >
> > index <- createDataPartition(d$TYPE, p = .70,list = FALSE) tr <-
> > d[index, ] ts <- d[-index, ]
> >
> > ctrl <- trainControl(method = "cv",number=3, index = index, classProbs
> > = TRUE, summaryFunction = multiClassSummary)
> >
> > ran <- train(TYPE ~ ., data = tr,
> >                     method = "rpart",
> >                     ## Will create 48 parameter combinations
> >                     tuneLength = 3,
> >                     na.action= na.pass,
> >                     metric = "Accuracy",
> >                     preProc = c("center", "scale", "nzv"),
> >                     trControl = ctrl)
> > getTrainPerf(ran)
> >
> > *It gives me error:*
> >
> >
> > *Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
> > contrasts can be applied only to factors with 2 or more levels*
> >
> >
> > *My data is as follow*
> >
> > Rows: 1,819
> > Columns: 14
> > $ PLUGIN_RULE_KEY             <chr> "InsufficientBranchCoverage",
> > "InsufficientLin~
> > $ PLUGIN_CONFIG_KEY           <chr> "", "", "", "", "", "", "", "", "",
> "",
> > "S1120~
> > $ PLUGIN_NAME                 <chr> "common-java", "common-java",
> > "common-java", "~
> > $ DESCRIPTION                 <chr> "An issue is created on a file as
> soon
> > as the ~
> > $ SEVERITY                    <chr> "MAJOR", "MAJOR", "MAJOR", "MAJOR",
> > "MAJOR", "~
> > $ NAME                        <chr> "Branches should have sufficient
> > coverage by t~
> > $ DEF_REMEDIATION_FUNCTION    <chr> "LINEAR", "LINEAR", "LINEAR",
> > "LINEAR_OFFSET",~
> > $ REMEDIATION_GAP_MULT        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA,
> NA,
> > NA, NA~
> > $ DEF_REMEDIATION_BASE_EFFORT <chr> "", "", "", "10min", "", "",
> > "5min", "5min", "~
> > $ GAP_DESCRIPTION             <chr> "number of uncovered conditions",
> > "number of l~
> > $ SYSTEM_TAGS                 <chr> "bad-practice", "bad-practice",
> > "convention", ~
> > $ IS_TEMPLATE                 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0,
> > 0, 0, 0~
> > $ DESCRIPTION_FORMAT          <chr> "HTML", "HTML", "HTML", "HTML",
> "HTML",
> > "HTML"~
> > $ TYPE                        <chr> "CODE_SMELL", "CODE_SMELL",
> > "CODE_SMELL", "COD~
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail
> > man_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAs
> > Rzsn7AkP-g&m=HOpL0ELxWdK0xzzVxRd_DnxukD-qPEQIBxDJnlSkAQrae1FdSHYJTfWxo
> > RrVO5eP&s=f3IyuRfeDDjr_8UWlwyBTC5Yn4Y56QV4FjYC0GCWcVc&e=
> > PLEASE do read the posting guide
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.or
> > g_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeA
> > sRzsn7AkP-g&m=HOpL0ELxWdK0xzzVxRd_DnxukD-qPEQIBxDJnlSkAQrae1FdSHYJTfWx
> > oRrVO5eP&s=Vo6cRRCeqGApsiEGGtA6pndDHjOIuGFOs7BOkJMvuaw&e=
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=HOpL0ELxWdK0xzzVxRd_DnxukD-qPEQIBxDJnlSkAQrae1FdSHYJTfWxoRrVO5eP&s=f3IyuRfeDDjr_8UWlwyBTC5Yn4Y56QV4FjYC0GCWcVc&e=
> PLEASE do read the posting guide
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=HOpL0ELxWdK0xzzVxRd_DnxukD-qPEQIBxDJnlSkAQrae1FdSHYJTfWxoRrVO5eP&s=Vo6cRRCeqGApsiEGGtA6pndDHjOIuGFOs7BOkJMvuaw&e=
> and provide commented, minimal, self-contained, reproducible code.
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list