# [R] GLM: include observations with missing explanatory variables

Frank van Berkum frankieboytje at hotmail.com
Tue Dec 29 14:33:33 CET 2015

Hi all,
My problem is the following.
Suppose I have a dataset with observations Y and explanatory variables X1, ..., Xn, and suppose one of these explanatory variables is geographical area (of which there are ten, j=1,...,10).  For some observations I know the area, but for others it is unknown and therefore record as NA.
I want to estimate a model of the form Y[i] ~ Poisson( lambda[i] ) with log(lambda[i]) = constant + \sum_j I[!is.na(area[i])] * I[area[i]==j] * beta[j]
In words: we estimate a constant for all observations and a factor for each area. If it is unknown what the area is, we only include the constant.
When estimating this model using glm(), the records with is.na(area[i]) are 'deleted' from the dataset, and this I do not want. I had hoped that the model as described above could be estimated using the function I() (interpret as), but so far my attempts have not succeeded.
Any help on how to approach this is kindly appreciated.
Kind regards,
Frank van Berkum
[[alternative HTML version deleted]]