[R] A question on dummy variable

Bogaso Christofer bogaso.christofer at gmail.com
Wed Jan 12 19:52:12 CET 2011


Thanks Gabor and other for their input. I admit that I must have placed some
reproducible codes on what I wanted. However it was actually in my mind
however I restrained because it was not any R related query rather a general
Statistics related.

Here I am using dummy variables in ***Time series context***. Please assume
following artificial TS along with the quarterly dummies:

library(zoo)
# my time series
MyTimeSeries <- zooreg(101:126, start=as.yearqtr(as.Date("2005-01-01")),
frequency=4)
# creation of quarterly dummy
### dummy1
dummy1 <- zooreg(Reduce("rbind", rep(list(diag(4)), 7)),
start=as.yearqtr(as.Date("2005-01-01")), frequency=4) 
dummy1 <- merge(dummy1, MyTimeSeries, all=F)[,1:4]
colnames(dummy1) <- paste("dummy", 1:4, sep="")
### dummy2
dummy2 <- dummy1 - 1/4
### dummy3
dummy3 <- dummy1
dummy3[dummy3 ==0] = -1/(4-1)
# Time series with quarterly dummy
TS_with_dummy1 <- cbind(MyTimeSeries, dummy1[,-4])
TS_with_dummy2 <- cbind(MyTimeSeries, dummy2[,-4])
TS_with_dummy3 <- cbind(MyTimeSeries, dummy3[,-4])
TS_with_dummy1
TS_with_dummy2
TS_with_dummy3

Here you see, as my previous post, there are 3 types of dummies: dummy1,
dummy2, and dummy3 (quarterly dummies). I used to use dummy1 declaration for
all my time series analysis. However later in the "vars" package I noticed
the 2nd type of definition for dummy. And 3rd definition I have come across
from somewhere in net (which I cant just recall at this time.) Here my
question was: which is the centred dummy variable (according to help page of
vars package 2nd one is the centred dummy)?

However I am searching for the definition of centred dummy variables in time
series analysis context. Therefore I would want to know, why 2nd one is
called centred dummy? Why people prefer for it, not the Standard dummy
definition (i.e. dummy1).

Can you please explain?

Thanks and regards,

-----Original Message-----
From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com] 
Sent: 12 January 2011 05:47
To: Christofer Bogaso
Cc: r-help at r-project.org
Subject: Re: [R] A question on dummy variable

On Tue, Jan 11, 2011 at 3:18 PM, Christofer Bogaso
<bogaso.christofer at gmail.com> wrote:
> Dear all, I would like to ask one question related to statistics, for 
> specifically on defining dummy variables. As of now, I have come 
> across 3 different kind of dummy variables (assuming I am working with 
> Seasonal dummy, and number of season is 4):
>
>> dummy1 <- diag(4)
>> for(i in 1:3) dummy1 <- rbind(dummy1, diag(4))
>> dummy1 <- dummy1[,-4]
>>
>> dummy2 <- dummy1
>> dummy2[dummy2 == 0] = -1/(4-1)
>>
>> dummy3 <- dummy1 - 1/4
>>
>> head(dummy1)
>     [,1] [,2] [,3]
> [1,]    1    0    0
> [2,]    0    1    0
> [3,]    0    0    1
> [4,]    0    0    0
> [5,]    1    0    0
> [6,]    0    1    0
>> head(dummy2)
>           [,1]       [,2]       [,3]
> [1,]  1.0000000 -0.3333333 -0.3333333
> [2,] -0.3333333  1.0000000 -0.3333333
> [3,] -0.3333333 -0.3333333  1.0000000
> [4,] -0.3333333 -0.3333333 -0.3333333
> [5,]  1.0000000 -0.3333333 -0.3333333
> [6,] -0.3333333  1.0000000 -0.3333333
>> head(dummy3)
>      [,1]  [,2]  [,3]
> [1,]  0.75 -0.25 -0.25
> [2,] -0.25  0.75 -0.25
> [3,] -0.25 -0.25  0.75
> [4,] -0.25 -0.25 -0.25
> [5,]  0.75 -0.25 -0.25
> [6,] -0.25  0.75 -0.25
> Now I want to know which type of dummy definition is called Centered 
> dummy and why it is called so? Is it equivalent to use any of the 
> above definitions (atleast 2nd and 3rd?) It would really be very 
> helpful if somebody point any suggestion and clarification.
>

The contrasts of your dummy1 matrix are contr.SAS contrasts in R.
(The default contrasts in R are contr.treatment which are the same as
contr.SAS except contr.SAS uses the last level as the base whereas treatment
contrasts use the first level as the base.)

   options(contrasts = c("contr.SAS", "contr.poly"))
   f <- gl(4, 1, 16)
   M <- model.matrix( ~ f )
   all( M[, -1] == dummy1) # TRUE

Centered contrasts are ones which have been centered -- i.e. the mean of
each column has been subtracted from that column.  This is equivalent to
saying that the column sums are zero.

The means of the three columns of dummy1 are c(1/4, 1/4, 1/4) so if we
subtract 1/4 from dummy1 we get a centered contrasts matrix. That is
precisely what you did to get dummy3.  We can check that dummy3 is
centered:

   colSums(dummy3) # 0 0 0

dummy2 is just a scaled version of dummy3.  In fact dummy2 equals
dummy3 / .75 so its not fundamentally different.  Its columns still sum to
zero so its still centered.

   all( dummy2 == dummy3 / .75) # TRUE
   colSums(dummy2) # 0 0 0 except for floating point error

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list