[R] Basic Dummy Variable Creation
bates at stat.wisc.edu
Fri Sep 5 18:12:27 CEST 2003
"Francisco J. Bido" <bido at mac.com> writes:
> Hi There,
> While looking through the mailing list archive, I did not come across
> a simple minded example regarding the creation of dummy variables.
> The Gauss language provides the command "y = dummydn(x,v,p)" for
> creating dummy variables.
> x = Nx1 vector of data to be broken up into dummy variables.
> v = Kx1 vector specifying the K-1 breakpoints
> p = positive integer in the range [1,K], specifying which column
> should be dropped in the matrix of dummy variables.
> y = Nx(K-1) matrix containing the K-1 dummy variables.
> My recent mailing list archive inquiry has led me to examine R's
> "model.matrix" but it has so many options that I'm not seeing the
> forest because of the trees. Is that really the easiest way? or is
> there something similar to the dummydn command described above?
> To provide a concrete scenario, please consider the following. Using
> the above notation, say, I had:
> x <- c(1:10) #data to be broken up into dummy variables
> v <- c(3,5,7) #breakpoints
> p = 1 #drop this column to avoid dummy variable trap
> How can I get a matrix "y" that has the associated dummy variables for
Consider why you want the dummy variables. You probably want to use
them in the specification of a statistical model and R's model
specification language automatically expands a factor variable into a
set of contrasts.
fm = lm(weight ~ group, data = PlantGrowth)
and you will see that the `group' factor has been expanded to two of
the three indicator variables (if you use the default setting for
contrasts - other possibilities exist).
You can check explicitly how the model matrix is created with
The model specification facilities in R are much more flexible than
most other languages and you almost never need to create indicators
More information about the R-help