[R] Seeking to Dummify Categorical Variables

Bert Gunter bgunter.4567 at gmail.com
Mon Apr 3 02:14:21 CEST 2017


Just to be clear...

I can think of no reason to ever "dummify" categorical variables in R.
i.e. **Do not do this.**

Corollary 1: Learn how R's modeling functionality works: ?formula

Corollary 2: Do not try to do it as is done in SAS or SPSS or whatever
(as David already said)

(possible exception: packages that aren't smart enough to use
model.matrix etc. to do this by themselves. Also, do note that this is
intimately related to the issue of contrasts in linear models. See
?contrasts, ?C)

[nb: I would very much appreciate correction or "adjustment" on my
statement(s) if I am wrong]

Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Apr 2, 2017 at 3:58 PM, BR_email <br at dmstat1.com> wrote:
> Rui:
> I tried your suggestion, which was not fruitful.
> Another R-helper suggested the code below, which worked perfectly.
> Thanks for your suggestion and time spent.
>
> Regards,
> Bruce
>
> obj <- model.matrix( ID ~ Gender+0, data=GENDER )
> cbind(GENDER[ , 1, drop=FALSE], obj[,-3] )
>
>
> Bruce Ratner, Ph.D.
> The Significant Statistician™
> (516) 791-3544
> Statistical Predictive Analtyics -- www.DMSTAT1.com
> Machine-Learning Data Mining and Modeling -- www.GenIQ.net
>
> Rui Barradas wrote:
>>
>> Hello,
>>
>> Try the following.
>>
>> GENDER$Gender_male <- as.integer(GENDER$Gender == "male")
>> GENDER$Gender_female <- as.integer(GENDER$Gender == "female")
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Em 02-04-2017 19:48, BR_email escreveu:
>>>
>>> Hi R'ers:
>>> I need a jump start to obtain my objective.
>>> Assistance is greatly appreciated.
>>> Bruce
>>>
>>> *******
>>> #Given Gender Dataset
>>> r1       <- c( 1, 2, 3)
>>> c1       <- c( "male", "female", "NA")
>>> GENDER <- data.frame(r1,c1)
>>> names(d1_3) <- c("ID","Gender")
>>> GENDER
>>> --------------
>>> _OBJECTIVE_: To dummify GENDER,
>>> i.e., to generate two new numeric columns,
>>>          Gender_male and Gender_female,
>>> such that:
>>> when Gender="male"   then Gender_male=1 and Gender_female=0
>>> when Gender="female" then Gender_male=0 and Gender_female=1
>>> when Gender="NA"     then Gender_male=0 and Gender_female=0
>>>
>>> So, with the given dataset, the resultant dataset would be as follows:
>>> Desired Extended Gender Dataset
>>> ID Gender Gender_male Gender_female
>>> 1      male              1                   0
>>> 2   female              0                   1
>>> 3       NA               0                   0
>>>
>>
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list