# [R] Basic Dummy Variable Creation

John Fox jfox at mcmaster.ca
Fri Sep 5 17:50:59 CEST 2003

```Dear Francisco,

At 08:31 AM 9/5/2003 -0500, Francisco J. Bido wrote:
>Hi There,
>
>While looking through the mailing list archive, I did not come across a
>simple minded example regarding the creation of dummy variables.  The
>Gauss language provides the command "y = dummydn(x,v,p)" for creating
>dummy variables.
>Here:
>
>x = Nx1 vector of data to be broken up into dummy variables.
>v = Kx1 vector specifying the K-1 breakpoints
>p = positive integer in the range [1,K], specifying which column should be
>dropped in the matrix of dummy variables.
>y = Nx(K-1) matrix containing the K-1 dummy variables.
>
>My recent mailing list archive inquiry has led me to examine R's
>"model.matrix" but it has so many options that I'm not seeing the forest
>because of the trees.  Is that really the easiest way? or is there
>something similar to the dummydn command described above?
>
>To provide a concrete scenario, please consider the following.  Using the
>
>x <- c(1:10)      #data to be broken up into dummy variables
>v <- c(3,5,7)     #breakpoints
>p =  1                #drop this column to avoid dummy variable trap
>
>How can I get a matrix "y" that has the associated dummy variables for
>columns?
>Thank You,
>-Francisco

My initial question would be why do you want to do this? Statistical-model
formulas in R implicitly generate dummy variables (and other contrasts)
directly from factors, so if this is the context that you had in mind,
there's no need to generate the dummy variables explicitly.

If you really do want the matrix of dummy regressors, say for a factor
named "factor," then you can use model.matrix() to get them. Because the
default contrast type for unordered factors is "contr.treatment", which
corresponds to 0/1 dummy regressors, you can get the dummy variables as
model.matrix(~factor)[,-1]. Here I've removed the initial column of ones
returned by model matrix. Alternatively, model.matrix(~ factor - 1) gives
you a complete set of dummy regressors; you could then drop whichever
column you wanted to.

More generally, if you haven't already done so you might see how
linear-model formulas are implemented in R. All of the introductions to R
cover this topic. I think that this is one of the strengths of the S
language, by the way.

I hope that this helps,
John
-----------------------------------------------------
John Fox
Department of Sociology
McMaster University