[R] Earth (MARS) package with categorical predictors

Chris Wilkinson kinsham at verizon.net
Mon Nov 11 17:58:38 CET 2013


Steve, thanks for your reply. Here is what I get.

pkg is a 4-level categorical vector.

> is.factor(pkg)
[1] TRUE>
> summary(pkg)
BGA PGA QCC QFP 
225  36  19 178 
>
> dat <- earth(lifetime ~ pkg+pins+volts+temp+doi+logspd, degree=3) ## The
other vars are continuous.
> s <- 243
> pr <- c(pkg[s],pins[s],volts[s],temp[s],doi[s],logspd[s])
> pkg[s]
[1] BGA
Levels: BGA PGA QCC QFP
> pr
[1]    1.000000  256.000000    3.300000  125.000000 2002.258105    4.890349
> pred <- predict(dat, newdata=pr)
Error : variable 'pkg' was fitted with type "factor" but type "numeric" was
supplied
Forging on regardless, first few rows of x are
  pkg pins volts temp      doi   logspd
1   1  256   3.3  125 2002.258 4.890349
Error: get.earth.x from model.matrix.earth from predict.earth: the number 6
of columns of x
(after factor expansion) does not match the number 8 of columns of the earth
object
    expanded x:  pkg pins volts temp doi logspd
    object$dirs: pkgPGA pkgQCC pkgQFP pins volts temp doi logspd
Possible remedy: check factors in the input data
>

Pkg is being passed as numeric 1. I'm unsure how to correctly specify pkg
for predict. In the example you gave, does the data include a categorical?

Chris

-----Original Message-----
From: Stephen Milborrow [mailto:milbo at sonic.net] 
Sent: Monday, November 11, 2013 7:21 AM
To: kinsham at verizon.net
Subject: [R] Earth (MARS) package with categorical predictors

See if you can provide a simple reproducible example.  It's not clear 
exactly what the issue is from your question.  The following simple example 
gives the correct response:

data(etitanic)
a <- earth(survived~., data=etitanic)
predict(a, newdata=etitanic[1,])

Regards,
Steve

Message: 42
Date: Thu, 07 Nov 2013 23:16:18 -0500
From: Chris Wilkinson <kinsham at verizon.net>
To: r-help at r-project.org, Chris Wilkinson <kinsham at verizon.net>
Subject: [R] Earth (MARS) package with categorical predictors
Message-ID: <ml99syxejec3ep0u4h0je78h.1383884178002 at email.android.com>
Content-Type: text/plain; charset=utf-8

It appears to be legitimate to include multi-level categorical and
continuous variables in defining the model for earth (e.g. y ~ cat +
cont1 + cont2) but is it also then possible use categoricals in the
predict method using the earth result? I tried but it returns an error
which is not very informative.

Thanks

Chris



More information about the R-help mailing list