[R] A question on dummy variable

John Sorkin jsorkin at grecc.umaryland.edu
Wed Jan 12 04:38:27 CET 2011


Christofer,
I am not sure I understand how you are using your dummy variables. Generally if you have n categories you need n-1 dummy variables. Thus if you have three categories, low, medium, high and want to compare two of the levels to a reference level (a coding scheme sometimes called reference cell coding) you could use the following coding which medium and high to the reference level, low:

level   dummy1 dummy2
low        0     0        
medium     0     1
high       1     0

You will notice that for three categories, my dummy variables from  an 3 by 2 matrix. In general the dummy variable matrix for n categories will be an n by n-1 matrix. You say your have four seasons. I would expect your dummy variable matrix to be of size 4 by 3. Your matrices are 6 by 3. Am I not understanding what you are trying to do?
John

John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC,
University of Maryland School of Medicine Claude D. Pepper OAIC,
University of Maryland Clinical Nutrition Research Unit, and
Baltimore VA Center Stroke of Excellence

University of Maryland School of Medicine
Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524

(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
jsorkin at grecc.umaryland.edu

>>> Christofer Bogaso <bogaso.christofer at gmail.com> 1/11/2011 3:18 PM >>>
Dear all, I would like to ask one question related to statistics, for
specifically on defining dummy variables. As of now, I have come across 3
different kind of dummy variables (assuming I am working with Seasonal
dummy, and number of season is 4):

> dummy1 <- diag(4)
> for(i in 1:3) dummy1 <- rbind(dummy1, diag(4))
> dummy1 <- dummy1[,-4]
>
> dummy2 <- dummy1
> dummy2[dummy2 == 0] = -1/(4-1)
>
> dummy3 <- dummy1 - 1/4
>
> head(dummy1)
     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1
[4,]    0    0    0
[5,]    1    0    0
[6,]    0    1    0
> head(dummy2)
           [,1]       [,2]       [,3]
[1,]  1.0000000 -0.3333333 -0.3333333
[2,] -0.3333333  1.0000000 -0.3333333
[3,] -0.3333333 -0.3333333  1.0000000
[4,] -0.3333333 -0.3333333 -0.3333333
[5,]  1.0000000 -0.3333333 -0.3333333
[6,] -0.3333333  1.0000000 -0.3333333
> head(dummy3)
      [,1]  [,2]  [,3]
[1,]  0.75 -0.25 -0.25
[2,] -0.25  0.75 -0.25
[3,] -0.25 -0.25  0.75
[4,] -0.25 -0.25 -0.25
[5,]  0.75 -0.25 -0.25
[6,] -0.25  0.75 -0.25
Now I want to know which type of dummy definition is called Centered dummy
and why it is called so? Is it equivalent to use any of the above
definitions (atleast 2nd and 3rd?) It would really be very helpful if
somebody point any suggestion and clarification.

Thanks and regards,

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}



More information about the R-help mailing list