[R] How make a x,y dataset from a formula based entry

Gabor Grothendieck ggrothendieck at gmail.com
Fri Sep 23 15:32:03 CEST 2011


On Thu, Sep 22, 2011 at 2:54 PM, trekvana <trekvana at aol.com> wrote:
> Hello all,
>
> So I am using the (formula entry) method for randomForests:
>
> randomForest(y~x1+x2+...+x39+x40,data=xxx,...) but the issue is that some of
> the items in that package dont take a formula entry - you have to explicitly
> state the y and x vector:
>
> randomForest(x=xxx[,c('x1','x2',...,'x40')],y=xxx[,'y'],...)
>
> Now my question is whether there is a function/way to tell R to take a
> formula and make the two corresponding datasets [x,y] (that way I dont have
> to create the x dataset manually with all 40 variables I have).
>
> There must be a more elegant way to do this than
> x=xxx[,c('x1','x2',...,'x40')]

We assume that the formula is of the form:

fo <- y ~ x1 + x2 + x3

Now if we set:

v <- all.vars(fo)

and if DF is our data frame then DF[, v[1]] and DF[v[-1]] are the
response and predictors.  (You may need to add an intercept to the
predictors and convert the predictors from data frame to a matrix
depending on what you intend to do next.)

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list