[R] predict nbinomial glm

Sundar Dorai-Raj sundar.dorai-raj at pdf.com
Tue Aug 16 16:33:47 CEST 2005


Katharina,

I agree with Prof. Ripley's assessment. But, perhaps one thing you may 
have overlooked is that subset.data.frame does not remove unused levels. So,

 > subset_of_dataframe = subset(data_frame, (b > 80 & c < 190))
 > levels(subset_of_dataframe$d)
[1] "q" "r" "s" "t"
 > table(subset_of_dataframe$d)
  q  r  s  t
  0 20 50 10

Even though the level "q" does not appear it is still a level of "d". 
Perhaps you need to do the following after the subset:

subset_of_dataframe[] <- lapply(subset_of_dataframe, "[", drop = TRUE)

which drops all unused levels from factors.

I'm not sure if your problem is statistical in nature or simply a 
misunderstanding of the software. I'm only attempting to answer the 
latter. As Prof. Ripley suggests, discuss any statistical problem (i.e. 
predicting on missing levels) with your advisor.

HTH,

--sundar

P.S. Also, update R. It's free.

Prof Brian Ripley wrote:
> This is seems to be an unstated repeat of much of an earlier and 
> unanswered post
> 
>  	https://stat.ethz.ch/pipermail/r-help/2005-August/075914.html
> 
> entitled
> 
>  	[R] error in predict glm (new levels cause problems)
> 
> It is nothing to do with `nbinomial glm' (sic): all model fitting 
> functions including lm and glm do this.  The reason you did not get at 
> least one reply from your first post is that you seemed not to have done 
> your homework.  (One thing the posting guide does ask is for you to try 
> the current version of R, and yours is three versions old.)
> 
> The code is protecting you from an attempt at statistical nonsense. 
> (Indeed, the check was added to catch such misuses.)  Your email address 
> seems to be that of a student, so please seek the help of your advisor. 
> You seem surprised that you are not allowed to make predictions about 
> levels for which you have supplied no relevant data.
> 
> 
> On Tue, 16 Aug 2005, K. Steinmann wrote:
> 
> 
>>Dear R-helpers,
>>
>>let us assume, that I have the following dataset:
>>
>>a <- rnbinom(200, 1, 0.5)
>>b <- (1:200)
>>c <- (30:229)
>>d <- rep(c("q", "r", "s", "t"), rep(50,4))
>>data_frame <- data.frame(a,b,c,d)
>>
>>In a first step I run a glm.nb (full code is given at the end of this mail) and
>>want to predict my response variable a.
>>In a second step, I would like to run a glm.nb based on a subset of the
>>data_frame. As soon as I want to predict the response variable a, I get the
>>following error message:
>>"Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
>>object$xlevels) :
>>       factor d has new level(s) q"
>>
>>Does anybody have a solution to this problem?
>>
>>Thank you in advance,
>>K. Steinmann (working with R 2.0.0)
>>
>>
>>Code:
>>
>>library(MASS)
>>
>>a <- rnbinom(200, 1, 0.5)
>>b <- (1:200)
>>c <- (30:229)
>>d <- rep(c("q", "r", "s", "t"), rep(50,4))
>>
>>data_frame <- data.frame(a,b,c,d)
>>
>>model_1 = glm.nb(a ~ b + d , data = data_frame)
>>
>>pred_model_1 = predict(model_1, newdata = data_frame, type = "response", se.fit
>>= FALSE, dispersion = NULL, terms = NULL)
>>
>>subset_of_dataframe = subset(data_frame, (b > 80 & c < 190 ))
>>
>>model_2 = glm.nb(a ~ b + d , data = subset_of_dataframe)
>>pred_model_2 = predict(model_2, newdata = subset_of_dataframe, type =
>>"response", se.fit = FALSE, dispersion = NULL, terms = NULL)
> 
>




More information about the R-help mailing list