[R] test logistic regression model

Mitchell Maltenfort mm@|ten @end|ng |rom gm@||@com
Sun Nov 20 19:38:18 CET 2022


Agreed on the ranking of (1) vs (2)



On Sun, Nov 20, 2022 at 1:30 PM Ebert,Timothy Aaron <tebert using ufl.edu> wrote:

> I like option 1. Option 2 may cause problems if you are pooling groups
> that do not go together. This is especially a problem if you know that the
> data is missing some groups. I would consider dropping rare groups - or
> compare results between pooling and dropping options. If the answer is the
> same in both cases then use the approach that makes your life easier with
> reviewers/clients. If the answer is different then I would go with dropping
> rare categories, or present both and highlight the difference in outcome. A
> third option is to gather more data.
>
> Tim
>
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Bert Gunter
> Sent: Sunday, November 20, 2022 1:06 PM
> To: Mitchell Maltenfort <mmalten using gmail.com>
> Cc: R-help <R-help using r-project.org>
> Subject: Re: [R] test logistic regression model
>
> [External Email]
>
> I think (2) might be a bad idea if one of the "sparse"categories has high
> predictive power. You'll lose it when you pool, will you not?
> Also, there is the problem of subjectively defining "sparse."
>
> However, 1) seems quite sensible to me. But IANAE.
>
> -- Bert
>
> On Sun, Nov 20, 2022 at 9:49 AM Mitchell Maltenfort <mmalten using gmail.com>
> wrote:
> >
> > Two possible fixes occur to me
> >
> > 1) Redo the test/training split but within levels of factor - so you
> > have the same split within each level and each level accounted for in
> > training and testing
> >
> > 2) if you have a lot of levels, and perhaps sparse representation in a
> > few, consider recoding levels to pool the rare ones into an "other"
> > category
> >
> > On Sun, Nov 20, 2022 at 11:41 AM Bert Gunter <bgunter.4567 using gmail.com>
> wrote:
> >>
> >> small reprex:
> >>
> >> set.seed(5)
> >> dat <- data.frame(f = rep(c('r','g'),4), y = runif(8)) newdat <-
> >> data.frame(f =rep(c('r','g','b'),2)) ## convert values in newdat not
> >> seen in dat to NA
> >> is.na(newdat$f) <-!( newdat$f %in% dat$f) lmfit <- lm(y~f, data =
> >> dat)
> >>
> >> ##Result:
> >> > predict(lmfit,newdat)
> >>         1         2         3         4         5         6
> >> 0.4374251 0.6196527        NA 0.4374251 0.6196527        NA
> >>
> >> If this does not suffice, as Rui said, we need details of what you did.
> >> (predict.glm works like predict.lm)
> >>
> >>
> >> -- Bert
> >>
> >>
> >> On Sun, Nov 20, 2022 at 7:46 AM Rui Barradas <ruipbarradas using sapo.pt>
> wrote:
> >> >
> >> > Às 15:29 de 20/11/2022, Gábor Malomsoki escreveu:
> >> > > Dear Bert,
> >> > >
> >> > > Yes, was trying to fill the not existing categories with NAs, but
> >> > > the suggested solutions in stackoverflow.com unfortunately did not
> work.
> >> > >
> >> > > Best regards
> >> > > Gabor
> >> > >
> >> > >
> >> > > Bert Gunter <bgunter.4567 using gmail.com> schrieb am So., 20. Nov.
> 2022, 16:20:
> >> > >
> >> > >> You can't predict results for categories that you've not seen
> >> > >> before (think about it). You will need to remove those cases
> >> > >> from your test set (or convert them to NA and predict them as NA).
> >> > >>
> >> > >> -- Bert
> >> > >>
> >> > >> On Sun, Nov 20, 2022 at 7:02 AM Gábor Malomsoki
> >> > >> <gmalomsoki1980 using gmail.com>
> >> > >> wrote:
> >> > >>
> >> > >>> Dear all,
> >> > >>>
> >> > >>> i have created a logistic regression model,
> >> > >>>   on the train df:
> >> > >>> mymodel1 <- glm(book_state ~ TG_KraftF5, data = train, family =
> >> > >>> "binomial")
> >> > >>>
> >> > >>> then i try to predict with the test df
> >> > >>> Predict<- predict(mymodel1, newdata = test, type = "response")
> >> > >>> then iget this error message:
> >> > >>> Error in model.frame.default(Terms, newdata, na.action =
> >> > >>> na.action, xlev =
> >> > >>> object$xlevels)
> >> > >>> Factor  "TG_KraftF5" has new levels
> >> > >>>
> >> > >>> i have tried different proposals from stackoverflow, but
> >> > >>> unfortunately they did not solved the problem.
> >> > >>> Do you have any idea how to test a logistic regression model
> >> > >>> when you have different levels in train and in test df?
> >> > >>>
> >> > >>> thank you in advance
> >> > >>> Regards,
> >> > >>> Gabor
> >> > >>>
> >> > >>>          [[alternative HTML version deleted]]
> >> > >>>
> >> > >>> ______________________________________________
> >> > >>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
> >> > >>> see
> >> > >>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F
> >> > >>> %2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%
> >> > >>> 7Ctebert%40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f
> >> > >>> 84a314d76ace60a62331e1b84%7C0%7C0%7C638045643951801851%7CUnknow
> >> > >>> n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1
> >> > >>> haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Ceyiq3LmFfHRlfnrw
> >> > >>> 87wzELUGTHLSv7qvuv1tyqGruU%3D&reserved=0
> >> > >>> PLEASE do read the posting guide
> >> > >>> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%
> >> > >>> 2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Cteb
> >> > >>> ert%40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a31
> >> > >>> 4d76ace60a62331e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CT
> >> > >>> WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi
> >> > >>> LCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=swql970slrq8f9bAwP%2FE
> >> > >>> s7PbWm5EQvFHWNga2JwHWeY%3D&reserved=0
> >> > >>> and provide commented, minimal, self-contained, reproducible code.
> >> > >>>
> >> > >>
> >> > >
> >> > >       [[alternative HTML version deleted]]
> >> > >
> >> > > ______________________________________________
> >> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> > > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2
> >> > > Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Cte
> >> > > bert%40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314
> >> > > d76ace60a62331e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFp
> >> > > bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXV
> >> > > CI6Mn0%3D%7C3000%7C%7C%7C&sdata=N2g%2Fx2IMW4OL0HSmq6pP2pxymP0
> >> > > FUAQbciQXRPOe7KM%3D&reserved=0
> >> > > PLEASE do read the posting guide
> >> > > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2F
> >> > > www.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%
> >> > > 40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314d76ac
> >> > > e60a62331e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFpbGZsb
> >> > > 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn
> >> > > 0%3D%7C3000%7C%7C%7C&sdata=swql970slrq8f9bAwP%2FEs7PbWm5EQvFH
> >> > > WNga2JwHWeY%3D&reserved=0 and provide commented, minimal,
> >> > > self-contained, reproducible code.
> >> >
> >> > hello,
> >> >
> >> > What exactly didn't work? You say you have tried the solutions
> >> > found in stackoverflow but without a link, we don't know which
> >> > answers to which questions you are talking about.
> >> > Like Bert said, if you assign NA to the new levels, present only in
> >> > test, it should work.
> >> >
> >> > Can you post links to what you have tried?
> >> >
> >> > Hope this helps,
> >> >
> >> > Rui Barradas
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsta
> >> t.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40u
> >> fl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314d76ace60a623
> >> 31e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
> >> MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%
> >> 7C%7C&sdata=N2g%2Fx2IMW4OL0HSmq6pP2pxymP0FUAQbciQXRPOe7KM%3D&
> >> reserved=0 PLEASE do read the posting guide
> >> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.
> >> r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.ed
> >> u%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314d76ace60a62331e1b
> >> 84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wL
> >> jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
> >> &sdata=swql970slrq8f9bAwP%2FEs7PbWm5EQvFHWNga2JwHWeY%3D&reser
> >> ved=0 and provide commented, minimal, self-contained, reproducible
> >> code.
> >
> > --
> > Sent from Gmail Mobile
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=N2g%2Fx2IMW4OL0HSmq6pP2pxymP0FUAQbciQXRPOe7KM%3D&reserved=0
> PLEASE do read the posting guide
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C32b7b7b6a5d6428e728e08dacb21f524%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638045643951958086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=swql970slrq8f9bAwP%2FEs7PbWm5EQvFHWNga2JwHWeY%3D&reserved=0
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Sent from Gmail Mobile

	[[alternative HTML version deleted]]



More information about the R-help mailing list