[R] predict() works with the design matrix but throws error with some rows of that matrix

Damir Cosic damir.cosic at gmail.com
Fri Nov 20 17:17:01 CET 2015


Peter, this is extremely helpful. It is my first time using model.matrix()
and I was not very clear about it.Thank you for clarifying it!

It also helped me figure out how to do out-of-sample prediction. I add it
here in case someone else needs help with this.

oos.df<-data.frame(x=c(0,1), z=factor(c(1,2),levels=1:2,
labels=c("short","tall")))
p4<-predict(m, oos.df, "probs")

To follow up on your comment how design matrix can be used for multiplying
the coefficients and not for looking up variables, I tried to do just that,
but I am getting different results from those returned by predict().

For example,

head(predict(m, df, "probs"))

returns

          good          bad         ugly
1 9.999256e-01 3.419782e-05 4.018257e-05
2 1.064469e-05 5.163435e-01 4.836458e-01
3 8.623258e-06 4.523593e-01 5.476321e-01
4 9.999418e-01 3.116868e-05 2.702482e-05
5 1.289231e-05 5.789110e-01 4.210761e-01
6 9.213303e-06 4.718692e-01 5.281215e-01

To do the same with matrix multiplication, I use the expression
1/(1+exp(-xb)):

head(1/(1+exp(-(cbind(c(1), mm) %*% coef(m)[2,]))))

which should produce the third column above, but it doesn't:

          [,1]
1 4.018394e-05
2 9.999780e-01
3 9.999843e-01
4 2.702566e-05
5 9.999694e-01
6 9.999826e-01




On Fri, Nov 20, 2015 at 4:57 AM, peter dalgaard <pdalgd at gmail.com> wrote:

>
> On 20 Nov 2015, at 10:07 , peter dalgaard <pdalgd at gmail.com> wrote:
>
> >>
> >> On 20 Nov 2015, at 04:53 , Damir Cosic <damir.cosic at gmail.com> wrote:
> >>
> >> Hello,
> >>
> >> I am having problems with predict() after a multinomial logit
> regression by
> >> multinom(). I generate a design matrix with model.matrix() and use it to
> >> estimate the model. Then, if I pass the entire design matrix to
> predict(),
> >> it returns the same output as fitted(), which is expected. But if I pass
> >> only a few rows of the design matrix, it throws this error:
> >>
> >> Error in model.frame.default(Terms, newdata, na.action = na.omit, xlev
> >> = object$xlevels) :    variable lengths differ (found for 'z') In
> addition:
> >>
> >> Warning message: 'newdata' had 6 rows but variables found have 15 rows
> >
> > Offhand (sorry, no time for testing things this morning) I suspect that
> you are mixing paradigms. You can _either_ multiply coefficients with a
> design matrix _or_ look up variables in a data frame, and I think you are
> trying to look up variables in a matrix. In particular, I don't expect mm
> to have a column called "z".
>
> A little further thought later: The crux is that matrices will double as
> data frames in the newdata argument, but it will only work for numerical
> variables. Your model contains a numeric and a factor variable so it won't
> work, for two reasons:
>
> > head(mm)
>            x ztall
> 1 -2.4963581     0
> 2  0.9895450     1
> 3  1.8755237     1
> 4  0.8911458     0
> 5 -2.1458457     1
> 6  0.6294571     1
>
> I.e., there is no "z" column, but even if there were, it would mismatch
> with the model
>
> > colnames(mm)[2] <- "z"
> > p2<-predict(m,head(mm),"probs")
> Error: variable 'z' was fitted with type "factor" but type "numeric" was
> supplied
> In addition: Warning message:
> In model.frame.default(Terms, newdata, na.action = na.omit, xlev =
> object$xlevels) :
>   variable 'z' is not a factor
>
> At this point, newdata=mm will fail too, as I predicted. (Pun almost
> unintended.)
>
> Notice that this works fine:
>
> > p2<-predict(m,head(df),"probs")
> > p2
>           good          bad         ugly
> 1 9.998923e-01 8.171733e-05 2.598999e-05
> 2 2.804424e-05 4.170214e-01 5.829506e-01
> 3 2.892377e-05 3.255878e-01 6.743832e-01
> 4 9.999315e-01 2.818836e-05 4.031584e-05
> 5 1.863116e-05 7.420142e-01 2.579672e-01
> 6 2.740359e-05 4.563101e-01 5.436625e-01
> >
>
> -pd
>
> >
> > Accordingly, neither of your examples actually work, both cases find z
> (and x?) in the global environment, it is just only in the latter example
> that the inconsistency is discovered. I think you want either to use
> model.frame or an explicit mm %*% coef(model) (or thereabouts).
> >
> >
> >>
> >> This is a minimal example:
> >>
> >> require(nnet)
> >>
> >> y<-factor(rep(c(1,2,3),5), levels=1:3, labels=c("good","bad","ugly"))
> >> x<-rnorm(15)+.2*rep(1:3,5)
> >> z<-factor(rep(c(1,2,2),5), levels=1:2, labels=c("short","tall"))
> >>
> >> df<-data.frame(y=y, x=x, z=z)
> >> mm<-model.matrix(~x+z, data=df)[,2:3]
> >> m<-multinom(y ~ x+z, data=df)
> >>
> >> p1<-predict(m,mm,"probs")
> >>
> >> p2<-predict(m,head(mm),"probs")
> >>
> >> My actual goal is out-of-sample prediction, but I could not make it work
> >> and, while debugging it, I reduced it to this problem.
> >>
> >> Best,
> >>
> >> Damir
> >>
> >>      [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > --
> > Peter Dalgaard, Professor,
> > Center for Statistics, Copenhagen Business School
> > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> > Phone: (+45)38153501
> > Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>
>
>
>
>
>
>
>
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list