[R] poisson regression

avsha38 AVSHALO2 at POST.TAU.AC.IL
Wed Nov 10 08:43:23 CET 2010


Hello All,

I have a question regarding using Poission Regression, I would like to Model
the number of hospitalizations by a set of covariates. 

The issue I ran into is "lack of fit" even after I tried to solve the
"overdispersion" problem with negetive binomial. 

What would you suggest?
1. Is the Poission Regression the right Model?
2. is there another Model that is better for my needs?

Any ideas will be more the welcome.
Thank you,
Avi

The File is attached and the code is below:
Please note that the "offset" is for dealingwith different followup times of
the subjects.

pr2<-read.csv("c:/rfiles/pr/pr_na.csv",header=TRUE)
head(pr2)
# distribution of the Response variable "zb44" , number of hospitalization. 
tab <- table(pr2$zb44)
tab
x <- as.numeric(names(tab))
plot(x, tab, type='h', xlab='Number of hospitalizations', ylab='Frequency',
main="Distribution of hospitalization")
points(x, tab, pch=16)
mean(pr2$zb44)
# find the "best model" according to AIC
fit.pr0<- glm(zb44 ~1+ offset(log(timem)),data=pr2, family=poisson) 
fit.full<- glm(zb44 ~
ESHKOL+factor(Phys_activity)+factor(PTCAwin45d)+factor(White_collar)+factor
(Asia_Africa)+factor(Sex)+Age+ factor(SteadyPartner)
         + factor(RelIncome)+ factor(Employment)+
Education+factor(CA_blockers)+Health_Prob+factor(PerceivHealth)+factor(Diuretics)+
factor(ACE)
         + factor(DiabTreat)+ factor(Insulin)+ factor(LLD)+ factor(Aspirin)+
factor(Smoking)+factor(CHD_duration)+ factor(HTN)+ factor(Diabetes)
         + factor(Hyperchol)+ factor(Intens_care)+ factor(Thromb)+
factor(AP)+ factor(QWave)+factor(AntMI)+hosp_days+factor(COPD)+
factor(B_blockers)
         + factor(Cancer)+factor(Ulcer)+factor(kilip2)+ factor(HF_index)+
factor(CVA)+Charlson_cat
         + factor(Cardiac_death)+
factor(CABGwin45d)+offset(log(timem)),data=pr2,family=poisson) 
Forw <- step(fit.pr0, scope = list(lower =fit.pr0, upper=
fit.full),direction = "both")
summary(Forw)
anova.glm(Forw ,test = "Chisq") # display significate factors
# checking model fit
1-pchisq(Forw$dev, Forw$df.residual)
# p value is nearly zero, There is a lack of fit!!!

# Residuals plots
dev.new()
par(mfrow=c(2,2))
plot(Forw , pch=23, bg='red', cex=2)

# Examine (over) dispersion
# Pearson’s residuals
(dp <- sum(resid(Forw , type = "pearson")^2)/Forw$df.resid)
summary(Forw,dispersion=dp)
# Deviance
(dp1<-Forw$deviance/Forw$df.resid)
summary(Forw,dispersion=dp1)
# Model is over dispersed

# Consider a negative binomial  to deal with overdispersion

NB.model<- step(glm.nb(hosp_total~
ESHKOL+factor(Phys_activity)+factor(PTCAwin45d)+factor(White_collar)+factor(Asia_Africa)+factor(Sex)+Age+
factor(SteadyPartner)
         + factor(RelIncome)+ factor(Employment)+
Education+factor(CA_blockers)+Health_Prob+factor(PerceivHealth)+factor(Diuretics)+
factor(ACE)
         + factor(DiabTreat)+ factor(Insulin)+ factor(LLD)+ factor(Aspirin)+
factor(Smoking)+factor(CHD_duration)+ factor(HTN)+ factor(Diabetes)
         + factor(Hyperchol)+ factor(Intens_care)+ factor(Thromb)+
factor(AP)+ factor(QWave)+factor(AntMI)+hosp_days+factor(COPD)+
factor(B_blockers)
         + factor(Cancer)+factor(Ulcer)+factor(kilip2)+ factor(HF_index)+
factor(CVA)+Charlson_cat
         + factor(Cardiac_death)+
factor(CABGwin45d)+offset(log(timem)),data=pr2)) 
summary(NB.model)
1 - pchisq(NB.model$dev, NB.model$df.residual)
par(mfrow=c(2,2))
plot(NB.model, pch=23, bg='red', cex=2)
# p value is nearly zero, There is still a lack of fit!!!
http://r.789695.n4.nabble.com/file/n3035613/pr_na.csv pr_na.csv 
-- 
View this message in context: http://r.789695.n4.nabble.com/poisson-regression-tp3035613p3035613.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list