[R] poisson regression

Mitchell Maltenfort mmalten at gmail.com
Wed Nov 10 13:21:38 CET 2010


The other possibilities are:

(1) you're missing a necessary interaction term
(2) one of the variables affecting output just isn't in your data set.
(3) you need to transform 'hosp_days' or 'age' -- are those the only
two continuous variables?  Might be worth trying to plot 'em versus #
of hospitalizations just to see if a pattern emerges that gives you a
hint.

"Lack of fit" just means that your model isn't as good as the model
created by taking each combination of factors as a separate variable.
I'm skeptical whether that means you necessarily have a bad model.


On Wed, Nov 10, 2010 at 2:43 AM, avsha38 <AVSHALO2 at post.tau.ac.il> wrote:
>
> Hello All,
>
> I have a question regarding using Poission Regression, I would like to Model
> the number of hospitalizations by a set of covariates.
>
> The issue I ran into is "lack of fit" even after I tried to solve the
> "overdispersion" problem with negetive binomial.
>
> What would you suggest?
> 1. Is the Poission Regression the right Model?
> 2. is there another Model that is better for my needs?
>
> Any ideas will be more the welcome.
> Thank you,
> Avi
>
> The File is attached and the code is below:
> Please note that the "offset" is for dealingwith different followup times of
> the subjects.
>
> pr2<-read.csv("c:/rfiles/pr/pr_na.csv",header=TRUE)
> head(pr2)
> # distribution of the Response variable "zb44" , number of hospitalization.
> tab <- table(pr2$zb44)
> tab
> x <- as.numeric(names(tab))
> plot(x, tab, type='h', xlab='Number of hospitalizations', ylab='Frequency',
> main="Distribution of hospitalization")
> points(x, tab, pch=16)
> mean(pr2$zb44)
> # find the "best model" according to AIC
> fit.pr0<- glm(zb44 ~1+ offset(log(timem)),data=pr2, family=poisson)
> fit.full<- glm(zb44 ~
> ESHKOL+factor(Phys_activity)+factor(PTCAwin45d)+factor(White_collar)+factor
> (Asia_Africa)+factor(Sex)+Age+ factor(SteadyPartner)
>         + factor(RelIncome)+ factor(Employment)+
> Education+factor(CA_blockers)+Health_Prob+factor(PerceivHealth)+factor(Diuretics)+
> factor(ACE)
>         + factor(DiabTreat)+ factor(Insulin)+ factor(LLD)+ factor(Aspirin)+
> factor(Smoking)+factor(CHD_duration)+ factor(HTN)+ factor(Diabetes)
>         + factor(Hyperchol)+ factor(Intens_care)+ factor(Thromb)+
> factor(AP)+ factor(QWave)+factor(AntMI)+hosp_days+factor(COPD)+
> factor(B_blockers)
>         + factor(Cancer)+factor(Ulcer)+factor(kilip2)+ factor(HF_index)+
> factor(CVA)+Charlson_cat
>         + factor(Cardiac_death)+
> factor(CABGwin45d)+offset(log(timem)),data=pr2,family=poisson)
> Forw <- step(fit.pr0, scope = list(lower =fit.pr0, upper=
> fit.full),direction = "both")
> summary(Forw)
> anova.glm(Forw ,test = "Chisq") # display significate factors
> # checking model fit
> 1-pchisq(Forw$dev, Forw$df.residual)
> # p value is nearly zero, There is a lack of fit!!!
>
> # Residuals plots
> dev.new()
> par(mfrow=c(2,2))
> plot(Forw , pch=23, bg='red', cex=2)
>
> # Examine (over) dispersion
> # Pearson’s residuals
> (dp <- sum(resid(Forw , type = "pearson")^2)/Forw$df.resid)
> summary(Forw,dispersion=dp)
> # Deviance
> (dp1<-Forw$deviance/Forw$df.resid)
> summary(Forw,dispersion=dp1)
> # Model is over dispersed
>
> # Consider a negative binomial  to deal with overdispersion
>
> NB.model<- step(glm.nb(hosp_total~
> ESHKOL+factor(Phys_activity)+factor(PTCAwin45d)+factor(White_collar)+factor(Asia_Africa)+factor(Sex)+Age+
> factor(SteadyPartner)
>         + factor(RelIncome)+ factor(Employment)+
> Education+factor(CA_blockers)+Health_Prob+factor(PerceivHealth)+factor(Diuretics)+
> factor(ACE)
>         + factor(DiabTreat)+ factor(Insulin)+ factor(LLD)+ factor(Aspirin)+
> factor(Smoking)+factor(CHD_duration)+ factor(HTN)+ factor(Diabetes)
>         + factor(Hyperchol)+ factor(Intens_care)+ factor(Thromb)+
> factor(AP)+ factor(QWave)+factor(AntMI)+hosp_days+factor(COPD)+
> factor(B_blockers)
>         + factor(Cancer)+factor(Ulcer)+factor(kilip2)+ factor(HF_index)+
> factor(CVA)+Charlson_cat
>         + factor(Cardiac_death)+
> factor(CABGwin45d)+offset(log(timem)),data=pr2))
> summary(NB.model)
> 1 - pchisq(NB.model$dev, NB.model$df.residual)
> par(mfrow=c(2,2))
> plot(NB.model, pch=23, bg='red', cex=2)
> # p value is nearly zero, There is still a lack of fit!!!
> http://r.789695.n4.nabble.com/file/n3035613/pr_na.csv pr_na.csv
> --
> View this message in context: http://r.789695.n4.nabble.com/poisson-regression-tp3035613p3035613.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list