[R] [FORGED] Response Variable Coding

Rolf Turner r.turner at auckland.ac.nz
Thu Aug 20 06:11:07 CEST 2015


On 20/08/15 09:43, Abraham Mathew wrote:
> Very simple question that I want confirm.
>
> Let's say that I have a response variable. What are the appropriate ways
> that it can be coded for a logistic regression model?
>
> 1. It can be 0/1 and a factor
> 2. It can be 1/2 and a factor
> 3. It can be characters and a factor, where the second letter takes on the
> 1. (bad/good becomes 0/1).
> 4. ?
> 5. ?
>
>
> My question is....are 1, 2, and 3 all correct, and are there other coding
> schemes that glm can take.

When in doubt, RTFM! :-)

 From ?binomial:

> For the binomial and quasibinomial families the response can be
> specified in one of three ways:
>
> As a factor: ‘success’ is interpreted as the factor not having the first
> level (and hence usually of having the second level).
>
> As a numerical vector with values between 0 and 1, interpreted as the
> proportion of successful cases (with the total number of cases given by
> the weights).
>
> As a two-column integer matrix: the first column gives the number of
> successes and the second the number of failures.

That pretty well says it all.  One thing to note:  If the response is a 
*numeric* vector of 0's and 1's it will produce the same result as it 
would if it were converted to a factor.  (This is because the default 
weights are all 1.)

HTH

cheers,

Rolf Turner


-- 
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276



More information about the R-help mailing list