[R] subset of factors in a regression

David Winsemius dwinsemius at comcast.net
Tue Jul 2 16:01:07 CEST 2013

On Jul 1, 2013, at 9:39 PM, Ben Bolker wrote:

> Philip A. Viton <viton.1 <at> osu.edu> writes:
>> suppose "state" is a variable in a dataframe containing abbreviations 
>> of the US states, as a factor. What I'd like to do is to include 
>> dummy variables for a few of the states, (say, CA and MA) among the 
>> independent variables in my regression formula. (This would be the 
>> equivalent of, creating, eg, ca<-state=="CA") and then including 
>> that). I know I can create all the necessary dummy variables by using 
>> the "outer" function on the factor and then renaming them 
>> appropriately; but is there a solution that's more direct, ie that 
>> doesn't involve a lot of new variables?
>> Thanks!
>  You could use model.matrix(~state-1) and select the columns
> you want, e.g.
> state <- state.abb; m <- model.matrix(~state-1)
> m[,colnames(m) %in% c("stateCA","stateMA")]
> -- but this will actually create a bunch of vectors you
> want before throwing them away.
> more compactly:
> m <- sapply(cstates,"==",state)
> storage.mode(m) <- "numeric"
> ## or m[] <- as.numeric(m)

Couldn't this be achieved with "I"?:

lm(Y ~ I(state=="CA") + I(state=="MA") + covariates, data=dfrm)

David Winsemius
Alameda, CA, USA

More information about the R-help mailing list