[R] Dependent Variable in Logistic Regression

William Dunlap wdun|@p @end|ng |rom t|bco@com
Sat Aug 1 22:09:47 CEST 2020


I like using a logical response in cases like this, but put its
construction in the formula so it is unambiguous when I look at the
results later.
> d <- data.frame(Covid=c("Pos","Pos","Neg","Pos","Neg","Neg"), Age=41:46)
> glm(family=binomial, data=d, Covid=="Pos"~Age)

Call:  glm(formula = Covid == "Pos" ~ Age, family = binomial, data = d)

Coefficients:
(Intercept)          Age
     52.810       -1.214

Degrees of Freedom: 5 Total (i.e. Null);  4 Residual
Null Deviance:      8.318
Residual Deviance: 4.956        AIC: 8.956


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sat, Aug 1, 2020 at 12:21 PM John Fox <jfox using mcmaster.ca> wrote:
>
> Dear Paul,
>
> I think that this thread has gotten unnecessarily complicated. The
> answer, as is easily demonstrated, is that a binary response for a
> binomial GLM in glm() may be a factor, a numeric variable, or a logical
> variable, with identical results; for example:
>
> --------------- snip -------------
>
>  > set.seed(123)
>
>  > head(x <- rnorm(100))
> [1] -0.56047565 -0.23017749  1.55870831  0.07050839  0.12928774  1.71506499
>
>  > head(y <- rbinom(100, 1, 1/(1 + exp(-x))))
> [1] 0 1 1 1 1 0
>
>  > head(yf <- as.factor(y))
> [1] 0 1 1 1 1 0
> Levels: 0 1
>
>  > head(yl <- y == 1)
> [1] FALSE  TRUE  TRUE  TRUE  TRUE FALSE
>
>  > glm(y ~ x, family=binomial)
>
> Call:  glm(formula = y ~ x, family = binomial)
>
> Coefficients:
> (Intercept)            x
>       0.3995       1.1670
>
> Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
> Null Deviance:      134.6
> Residual Deviance: 114.9        AIC: 118.9
>
>  > glm(yf ~ x, family=binomial)
>
> Call:  glm(formula = yf ~ x, family = binomial)
>
> Coefficients:
> (Intercept)            x
>       0.3995       1.1670
>
> Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
> Null Deviance:      134.6
> Residual Deviance: 114.9        AIC: 118.9
>
>  > glm(yl ~ x, family=binomial)
>
> Call:  glm(formula = yl ~ x, family = binomial)
>
> Coefficients:
> (Intercept)            x
>       0.3995       1.1670
>
> Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
> Null Deviance:      134.6
> Residual Deviance: 114.9        AIC: 118.9
>
> --------------- snip -------------
>
> The original poster claimed to have encountered an error with a 0/1
> numeric response, but didn't show any data or even a command. I suspect
> that the response was a character variable, but of course can't really
> know that.
>
> Best,
>   John
>
> John Fox, Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> web: https://socialsciences.mcmaster.ca/jfox/
>
> On 2020-08-01 2:25 p.m., Paul Bernal wrote:
> > Dear friend,
> >
> > I am aware that I have a binomial dependent variable, which is covid status
> > (1 if covid positive, and 0 otherwise).
> >
> > My question was if R requires to turn a binomial response variable into a
> > factor or not, that's all.
> >
> > Cheers,
> >
> > Paul
> >
> > El sáb., 1 de agosto de 2020 1:22 p. m., Bert Gunter <bgunter.4567 using gmail.com>
> > escribió:
> >
> >> ... yes, but so does lm() for a categorical **INdependent** variable with
> >> more than 2 numerically labeled levels. n levels  = (n-1) df for a
> >> categorical covariate, but 1 for a continuous one (unless more complex
> >> models are explicitly specified of course). As I said, the OP seems
> >> confused about whether he is referring to the response or covariates. Or
> >> maybe he just made the same typo I did.
> >>
> >> Bert Gunter
> >>
> >> "The trouble with having an open mind is that people keep coming along and
> >> sticking things into it."
> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>
> >>
> >> On Sat, Aug 1, 2020 at 11:15 AM Patrick (Malone Quantitative) <
> >> malone using malonequantitative.com> wrote:
> >>
> >>> No, R does not. glm() does in order to do logistic regression.
> >>>
> >>> On Sat, Aug 1, 2020 at 2:11 PM Paul Bernal <paulbernal07 using gmail.com>
> >>> wrote:
> >>>
> >>>> Hi Bert,
> >>>>
> >>>> Thank you for the kind reply.
> >>>>
> >>>> But what if I don't turn the variable into a factor. Let's say that in
> >>>> excel I just coded the variable as 1s and 0s and just imported the
> >>>> dataset
> >>>> into R and fitted the logistic regression without turning any categorical
> >>>> variable or dummy variable into a factor?
> >>>>
> >>>> Does R requires every dummy variable to be treated as a factor?
> >>>>
> >>>> Best regards,
> >>>>
> >>>> Paul
> >>>>
> >>>> El sáb., 1 de agosto de 2020 12:59 p. m., Bert Gunter <
> >>>> bgunter.4567 using gmail.com> escribió:
> >>>>
> >>>>> x <- factor(0:1)
> >>>>> x <- factor("yes","no")
> >>>>>
> >>>>> will produce identical results up to labeling.
> >>>>>
> >>>>>
> >>>>> Bert Gunter
> >>>>>
> >>>>> "The trouble with having an open mind is that people keep coming along
> >>>> and
> >>>>> sticking things into it."
> >>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>>>>
> >>>>>
> >>>>> On Sat, Aug 1, 2020 at 10:40 AM Paul Bernal <paulbernal07 using gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Dear friends,
> >>>>>>
> >>>>>> Hope you are doing great. I want to fit a logistic regression in R,
> >>>> where
> >>>>>> the dependent variable is the covid status (I used 1 for covid
> >>>> positives,
> >>>>>> and 0 for covid negatives), but when I ran the glm, R complains that I
> >>>>>> should make the dependent variable a factor.
> >>>>>>
> >>>>>> What would be more advisable, to keep the dependent variable with 1s
> >>>> and
> >>>>>> 0s, or code it as yes/no and then make it a factor?
> >>>>>>
> >>>>>> Any guidance will be greatly appreciated,
> >>>>>>
> >>>>>> Best regards,
> >>>>>>
> >>>>>> Paul
> >>>>>>
> >>>>>>          [[alternative HTML version deleted]]
> >>>>>>
> >>>>>> ______________________________________________
> >>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>> PLEASE do read the posting guide
> >>>>>> http://www.R-project.org/posting-guide.html
> >>>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>>>
> >>>>>
> >>>>
> >>>>          [[alternative HTML version deleted]]
> >>>>
> >>>> ______________________________________________
> >>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>
> >>>
> >>> --
> >>> Patrick S. Malone, Ph.D., Malone Quantitative
> >>> NEW Service Models: http://malonequantitative.com
> >>>
> >>> He/Him/His
> >>>
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list