[R] A question on dummy variable

Gabor Grothendieck ggrothendieck at gmail.com
Wed Jan 12 01:17:04 CET 2011


On Tue, Jan 11, 2011 at 3:18 PM, Christofer Bogaso
<bogaso.christofer at gmail.com> wrote:
> Dear all, I would like to ask one question related to statistics, for
> specifically on defining dummy variables. As of now, I have come across 3
> different kind of dummy variables (assuming I am working with Seasonal
> dummy, and number of season is 4):
>
>> dummy1 <- diag(4)
>> for(i in 1:3) dummy1 <- rbind(dummy1, diag(4))
>> dummy1 <- dummy1[,-4]
>>
>> dummy2 <- dummy1
>> dummy2[dummy2 == 0] = -1/(4-1)
>>
>> dummy3 <- dummy1 - 1/4
>>
>> head(dummy1)
>     [,1] [,2] [,3]
> [1,]    1    0    0
> [2,]    0    1    0
> [3,]    0    0    1
> [4,]    0    0    0
> [5,]    1    0    0
> [6,]    0    1    0
>> head(dummy2)
>           [,1]       [,2]       [,3]
> [1,]  1.0000000 -0.3333333 -0.3333333
> [2,] -0.3333333  1.0000000 -0.3333333
> [3,] -0.3333333 -0.3333333  1.0000000
> [4,] -0.3333333 -0.3333333 -0.3333333
> [5,]  1.0000000 -0.3333333 -0.3333333
> [6,] -0.3333333  1.0000000 -0.3333333
>> head(dummy3)
>      [,1]  [,2]  [,3]
> [1,]  0.75 -0.25 -0.25
> [2,] -0.25  0.75 -0.25
> [3,] -0.25 -0.25  0.75
> [4,] -0.25 -0.25 -0.25
> [5,]  0.75 -0.25 -0.25
> [6,] -0.25  0.75 -0.25
> Now I want to know which type of dummy definition is called Centered dummy
> and why it is called so? Is it equivalent to use any of the above
> definitions (atleast 2nd and 3rd?) It would really be very helpful if
> somebody point any suggestion and clarification.
>

The contrasts of your dummy1 matrix are contr.SAS contrasts in R.
(The default contrasts in R are contr.treatment which are the same as
contr.SAS except contr.SAS uses the last level as the base whereas
treatment contrasts use the first level as the base.)

   options(contrasts = c("contr.SAS", "contr.poly"))
   f <- gl(4, 1, 16)
   M <- model.matrix( ~ f )
   all( M[, -1] == dummy1) # TRUE

Centered contrasts are ones which have been centered -- i.e. the mean
of each column has been subtracted from that column.  This is
equivalent to saying that the column sums are zero.

The means of the three columns of dummy1 are c(1/4, 1/4, 1/4) so if we
subtract 1/4 from dummy1 we get a centered contrasts matrix. That is
precisely what you did to get dummy3.  We can check that dummy3 is
centered:

   colSums(dummy3) # 0 0 0

dummy2 is just a scaled version of dummy3.  In fact dummy2 equals
dummy3 / .75 so its not fundamentally different.  Its columns still
sum to zero so its still centered.

   all( dummy2 == dummy3 / .75) # TRUE
   colSums(dummy2) # 0 0 0 except for floating point error

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list