[R] simplify code for dummy coding of factors

John Fox jfox at mcmaster.ca
Wed Dec 31 00:56:13 CET 2014


Hi Michael,

At first I thought that as.numeric() would do it, but that loses the matrix
structure. Here are two solutions; I think that I prefer the second.

----------- snip --------------------

> (dummy.hair <-  outer(haireye.df$Hair, 
+     levels(haireye.df$Hair), function(x, y) as.numeric(x == y)))
      [,1] [,2] [,3] [,4]
 [1,]    1    0    0    0
 [2,]    0    1    0    0
 [3,]    0    0    1    0
 [4,]    0    0    0    1
 [5,]    1    0    0    0
 [6,]    0    1    0    0
 [7,]    0    0    1    0
 [8,]    0    0    0    1
 [9,]    1    0    0    0
[10,]    0    1    0    0
[11,]    0    0    1    0
[12,]    0    0    0    1
[13,]    1    0    0    0
[14,]    0    1    0    0
[15,]    0    0    1    0
[16,]    0    0    0    1
 
> (dummy.hair <- model.matrix(~ -1 + Hair, data=haireye.df))
   HairBlack HairBrown HairRed HairBlond
1          1         0       0         0
2          0         1       0         0
3          0         0       1         0
4          0         0       0         1
5          1         0       0         0
6          0         1       0         0
7          0         0       1         0
8          0         0       0         1
9          1         0       0         0
10         0         1       0         0
11         0         0       1         0
12         0         0       0         1
13         1         0       0         0
14         0         1       0         0
15         0         0       1         0
16         0         0       0         1
attr(,"assign")
[1] 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$Hair
[1] "contr.treatment"

----------- snip --------------------

I hope this helps,
 John

> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Michael
> Friendly
> Sent: Tuesday, December 30, 2014 6:05 PM
> To: R-help
> Subject: [R] simplify code for dummy coding of factors
> 
> In a manuscript, I have the following code to illustrate dummy coding of
> two factors in a contingency table.
> 
> It works, but is surely obscured by the method I used, involving outer()
> to find equalities and 0+outer()
> to convert to numeric.  Can someone help simplify this code to be more
> comprehensible and give the
> *same* result? I'd prefer a solution that uses base R.
> 
> haireye <- margin.table(HairEyeColor, 1:2)
> 
> haireye.df <- as.data.frame(haireye)
> dummy.hair <-  0+outer(haireye.df$Hair, levels(haireye.df$Hair), `==`)
> colnames(dummy.hair)  <- paste0('h', 1:4)
> dummy.eye <-  0+outer(haireye.df$Eye, levels(haireye.df$Eye), `==`)
> colnames(dummy.eye)  <- paste0('e', 1:4)
> 
> haireye.df <- data.frame(haireye.df, dummy.hair, dummy.eye)
> haireye.df
> 
>  > haireye.df
>      Hair   Eye Freq h1 h2 h3 h4 e1 e2 e3 e4
> 1  Black Brown   68  1  0  0  0  1  0  0  0
> 2  Brown Brown  119  0  1  0  0  1  0  0  0
> 3    Red Brown   26  0  0  1  0  1  0  0  0
> 4  Blond Brown    7  0  0  0  1  1  0  0  0
> 5  Black  Blue   20  1  0  0  0  0  1  0  0
> 6  Brown  Blue   84  0  1  0  0  0  1  0  0
> 7    Red  Blue   17  0  0  1  0  0  1  0  0
> 8  Blond  Blue   94  0  0  0  1  0  1  0  0
> 9  Black Hazel   15  1  0  0  0  0  0  1  0
> 10 Brown Hazel   54  0  1  0  0  0  0  1  0
> 11   Red Hazel   14  0  0  1  0  0  0  1  0
> 12 Blond Hazel   10  0  0  0  1  0  0  1  0
> 13 Black Green    5  1  0  0  0  0  0  0  1
> 14 Brown Green   29  0  1  0  0  0  0  0  1
> 15   Red Green   14  0  0  1  0  0  0  0  1
> 16 Blond Green   16  0  0  0  1  0  0  0  1
>  >
> 
> --
> Michael Friendly     Email: friendly AT yorku DOT ca
> Professor, Psychology Dept. & Chair, Quantitative Methods
> York University      Voice: 416 736-2100 x66249 Fax: 416 736-5814
> 4700 Keele Street    Web:http://www.datavis.ca
> Toronto, ONT  M3J 1P3 CANADA
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list