[R] GLM: include observations with missing explanatory variables

Rolf Turner r.turner at auckland.ac.nz
Tue Dec 29 21:54:30 CET 2015


On 30/12/15 02:33, Frank van Berkum wrote:

> Hi all, My problem is the following. Suppose I have a dataset with
> observations Y and explanatory variables X1, ..., Xn, and suppose one
> of these explanatory variables is geographical area (of which there
> are ten, j=1,...,10).  For some observations I know the area, but for
> others it is unknown and therefore record as NA. I want to estimate a
> model of the form Y[i] ~ Poisson( lambda[i] ) with log(lambda[i]) =
> constant + \sum_j I[!is.na(area[i])] * I[area[i]==j] * beta[j] In
> words: we estimate a constant for all observations and a factor for
> each area. If it is unknown what the area is, we only include the
> constant. When estimating this model using glm(), the records with
> is.na(area[i]) are 'deleted' from the dataset, and this I do not
> want. I had hoped that the model as described above could be
> estimated using the function I() (interpret as), but so far my
> attempts have not succeeded. Any help on how to approach this is
> kindly appreciated.

As Deep Thought was heard to remark: "Hmmmmm.  Tricky."

After pondering for a while it seems to me that you want to make NA the 
reference level of the factor "geographical area".

I.e. convert the NA values to a level of that factor and make it the 
*first* level.

E.g.:

set.seed(42)
GA <- factor(sample(LETTERS[1:10],200,TRUE))
GA[sample(1:200,10)] <- NA
tmp <- as.character(GA)
tmp[is.na(tmp)] <- "unknown"
GA <- factor(tmp,levels=c("unknown",LETTERS[1:10]))

y <- rpois(200,20) # Artificial response.
fit <- glm(y ~ GA,family=poisson)

Note that if you set

     z <- predict(fit)

(this gives predictions on the scale of the linear predictor)
then the entries of z corresponding to "unknown" are equal to the 
intercept coefficient of fit.

I can't quite get my head around whether this is *really* accomplishing 
what you want --- but it's a thought.

cheers,

Rolf Turner

-- 
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276



More information about the R-help mailing list