[R] Basic Dummy Variable Creation

Douglas Bates bates at stat.wisc.edu
Fri Sep 5 18:12:27 CEST 2003


"Francisco J. Bido" <bido at mac.com> writes:

> Hi There,
> 
> While looking through the mailing list archive, I did not come across
> a simple minded example regarding the creation of dummy variables.
> The Gauss language provides the command "y = dummydn(x,v,p)" for
> creating dummy variables.
> 
> Here:
> 
> x = Nx1 vector of data to be broken up into dummy variables.
> v = Kx1 vector specifying the K-1 breakpoints
> p = positive integer in the range [1,K], specifying which column
> should be dropped in the matrix of dummy variables.
> 
> y = Nx(K-1) matrix containing the K-1 dummy variables.
> 
> My recent mailing list archive inquiry has led me to examine R's
> "model.matrix" but it has so many options that I'm not seeing the
> forest because of the trees.  Is that really the easiest way? or is
> there something similar to the dummydn command described above?
> 
> 
> To provide a concrete scenario, please consider the following.  Using
> the above notation, say, I had:
> 
> 
> x <- c(1:10)      #data to be broken up into dummy variables
> v <- c(3,5,7)     #breakpoints
> p =  1                #drop this column to avoid dummy variable trap
> 
> How can I get a matrix "y" that has the associated dummy variables for
> columns?

Don't.

Consider why you want the dummy variables.  You probably want to use
them in the specification of a statistical model and R's model
specification language automatically expands a factor variable into a
set of contrasts.

Try

data(PlantGrowth)
fm = lm(weight ~ group, data = PlantGrowth)
summary(fm)

and you will see that the `group' factor has been expanded to two of
the three indicator variables (if you use the default setting for
contrasts - other possibilities exist).

You can check explicitly how the model matrix is created with

model.matrix(fm)

The model specification facilities in R are much more flexible than
most other languages and you almost never need to create indicators
explicitly.




More information about the R-help mailing list