[R] categorical variables

David Winsemius dwinsemius at comcast.net
Mon Dec 12 22:01:15 CET 2011


On Dec 12, 2011, at 3:38 PM, Uwe Ligges wrote:

>
>
> On 12.12.2011 19:36, Brian Jensvold wrote:
>> I am doing a logistic regression, and by accident I included a field
>> which has the 2digit abbreviation for all 50 states labeled "st".   
>> I was
>> surprised to see that the glm did not come up with an error message  
>> but
>> instead appears to have automatically broken down this field into
>> individual fields (stAK and stAL).  Does R really know to turn all
>> categorical variables in binary dummy variables?
>
> Yes.
>
>> I have tried answering
>> the question on my own and have found:
>>
>>
>>
>> When including categorical variables in a regression, the default  
>> in R
>> is to
>>
>> set the first level as the base.  Is there an option to specify a
>> different
>>
>> level as the base?
>
> Well, reorder to levels of the factor and use the most appropriate  
> base level as the first one. This simplifies life since it is from  
> now on the base level for all the models you try to fit.
>
>
>> My next/same question is what does it mean to "set the first level as
>> the base" does this mean it turns each value into a unique binary
>> result?
>
> What is a "unique binary result"?
>
> Actually, the base level is inlcuded in the intercept of your model  
> and you see the differences for the other levels.

Just to expand a bit on Uwe's efforts, for which we are all in his  
debt. You might see that there is one missing state level, "AK'  
perhaps, that would generally be included in the reference level. I  
would have thought it to be AK but apparently you see that  
abbreviation. Factor variables get handled auto-magically by  
regression functions.
>
> Uwe Ligges
>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list