[R] Seeking to Dummify Categorical Variables

BR_email br at dmstat1.com
Mon Apr 3 00:49:45 CEST 2017


David:
Thank you. It's perfect.
FYI: regarding your comment about "NA," yes, I filled it in just for the 
example.

Again, thanks for your professional and polite reply.
Bruce

Bruce Ratner, Ph.D.
The Significant Statistician™
(516) 791-3544
Statistical Predictive Analtyics -- www.DMSTAT1.com
Machine-Learning Data Mining and Modeling -- www.GenIQ.net
  

David Winsemius wrote:
>> On Apr 2, 2017, at 11:48 AM, BR_email <br at dmstat1.com> wrote:
>>
>> Hi R'ers:
>> I need a jump start to obtain my objective.
>> Assistance is greatly appreciated.
>> Bruce
>>
>> *******
>> #Given Gender Dataset
>> r1       <- c( 1, 2, 3)
>> c1       <- c( "male", "female", "NA")
>> GENDER <- data.frame(r1,c1)
>> names(d1_3) <- c("ID","Gender")
> #ITYM:
> names(GENDER) <- c("ID","Gender")
>
>> GENDER
>> --------------
>> _OBJECTIVE_: To dummify GENDER,
>> i.e., to generate two new numeric columns,
>>         Gender_male and Gender_female,
>> such that:
>> when Gender="male"   then Gender_male=1 and Gender_female=0
>> when Gender="female" then Gender_male=0 and Gender_female=1
>> when Gender="NA"     then Gender_male=0 and Gender_female=0
>>
>> So, with the given dataset, the resultant dataset would be as follows:
>> Desired Extended Gender Dataset
>> ID Gender Gender_male Gender_female
>> 1      male              1                   0
>> 2   female              0                   1
>> 3       NA               0                   0
> With that correction I think you might want:
>
>> model.matrix( ID ~ Gender+0, data=GENDER )
>    Genderfemale Gendermale GenderNA
> 1            0          1        0
> 2            1          0        0
> 3            0          0        1
> attr(,"assign")
> [1] 1 1 1
> attr(,"contrasts")
> attr(,"contrasts")$Gender
> [1] "contr.treatment"
>
> If you assigned that to an object name, say "obj" you could get your desired result with:
>
>> obj <- model.matrix( ID ~ Gender+0, data=GENDER )
>> cbind(GENDER[ , 1, drop=FALSE], obj[,-3] )
>    ID Genderfemale Gendermale
> 1  1            0          1
> 2  2            1          0
> 3  3            0          0
>
>
> I get the sense that you are trying to replicate a workflow that you developed in some other language and I think it would be more efficient for you to actually learn R rather than trying to write SAS or SPSS in R. If you like getting "into the weeds" of the language then I suggest trying to read the code in the `lm` function. It might help to refer back to Venables and Ripley's "S Programming" or reading Wickham's "Advanced R" pages on the web.
>



More information about the R-help mailing list