[R] predict() works with the design matrix but throws error with some rows of that matrix

peter dalgaard pdalgd at gmail.com
Fri Nov 20 10:57:15 CET 2015


On 20 Nov 2015, at 10:07 , peter dalgaard <pdalgd at gmail.com> wrote:

>> 
>> On 20 Nov 2015, at 04:53 , Damir Cosic <damir.cosic at gmail.com> wrote:
>> 
>> Hello,
>> 
>> I am having problems with predict() after a multinomial logit regression by
>> multinom(). I generate a design matrix with model.matrix() and use it to
>> estimate the model. Then, if I pass the entire design matrix to predict(),
>> it returns the same output as fitted(), which is expected. But if I pass
>> only a few rows of the design matrix, it throws this error:
>> 
>> Error in model.frame.default(Terms, newdata, na.action = na.omit, xlev
>> = object$xlevels) :    variable lengths differ (found for 'z') In addition:
>> 
>> Warning message: 'newdata' had 6 rows but variables found have 15 rows
> 
> Offhand (sorry, no time for testing things this morning) I suspect that you are mixing paradigms. You can _either_ multiply coefficients with a design matrix _or_ look up variables in a data frame, and I think you are trying to look up variables in a matrix. In particular, I don't expect mm to have a column called "z".

A little further thought later: The crux is that matrices will double as data frames in the newdata argument, but it will only work for numerical variables. Your model contains a numeric and a factor variable so it won't work, for two reasons:

> head(mm)
           x ztall
1 -2.4963581     0
2  0.9895450     1
3  1.8755237     1
4  0.8911458     0
5 -2.1458457     1
6  0.6294571     1

I.e., there is no "z" column, but even if there were, it would mismatch with the model

> colnames(mm)[2] <- "z"
> p2<-predict(m,head(mm),"probs")
Error: variable 'z' was fitted with type "factor" but type "numeric" was supplied
In addition: Warning message:
In model.frame.default(Terms, newdata, na.action = na.omit, xlev = object$xlevels) :
  variable 'z' is not a factor

At this point, newdata=mm will fail too, as I predicted. (Pun almost unintended.) 

Notice that this works fine:

> p2<-predict(m,head(df),"probs")
> p2
          good          bad         ugly
1 9.998923e-01 8.171733e-05 2.598999e-05
2 2.804424e-05 4.170214e-01 5.829506e-01
3 2.892377e-05 3.255878e-01 6.743832e-01
4 9.999315e-01 2.818836e-05 4.031584e-05
5 1.863116e-05 7.420142e-01 2.579672e-01
6 2.740359e-05 4.563101e-01 5.436625e-01
>

-pd

> 
> Accordingly, neither of your examples actually work, both cases find z (and x?) in the global environment, it is just only in the latter example that the inconsistency is discovered. I think you want either to use model.frame or an explicit mm %*% coef(model) (or thereabouts).
> 
> 
>> 
>> This is a minimal example:
>> 
>> require(nnet)
>> 
>> y<-factor(rep(c(1,2,3),5), levels=1:3, labels=c("good","bad","ugly"))
>> x<-rnorm(15)+.2*rep(1:3,5)
>> z<-factor(rep(c(1,2,2),5), levels=1:2, labels=c("short","tall"))
>> 
>> df<-data.frame(y=y, x=x, z=z)
>> mm<-model.matrix(~x+z, data=df)[,2:3]
>> m<-multinom(y ~ x+z, data=df)
>> 
>> p1<-predict(m,mm,"probs")
>> 
>> p2<-predict(m,head(mm),"probs")
>> 
>> My actual goal is out-of-sample prediction, but I could not make it work
>> and, while debugging it, I reduced it to this problem.
>> 
>> Best,
>> 
>> Damir
>> 
>> 	[[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list