[R] reference category for factor in regression

Jos Elkink jos.elkink at ucd.ie
Mon Jan 19 16:52:00 CET 2009


Hi Thierry,

Thanks for your quick answer. The problem is not so much the LABOUR
variable, however, but the AGE variable, which consists of about 5
categories for which I do indeed not create separate dummy variables.
But R does not behave as expected when deciding on which dummy to use
as reference category ...

Jos

On Mon, Jan 19, 2009 at 2:37 PM, ONKELINX, Thierry
<Thierry.ONKELINX at inbo.be> wrote:
> Dear Jos,
>
> In R you don't need to create you own dummy variables. Just create a
> factor variable LABOUR (with two levels) and rerun your model. Then you
> should be able to calculate all coefficients.
>
> HTH,
>
> Thierry
>
> ------------------------------------------------------------------------
> ----
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
> and Forest
> Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
> methodology and quality assurance
> Gaverstraat 4
> 9500 Geraardsbergen
> Belgium
> tel. + 32 54/436 185
> Thierry.Onkelinx at inbo.be
> www.inbo.be
>
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to
> say what the experiment died of.
> ~ Sir Ronald Aylmer Fisher
>
> The plural of anecdote is not data.
> ~ Roger Brinner
>
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of
> data.
> ~ John Tukey
>
> -----Oorspronkelijk bericht-----
> Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> Namens Jos Elkink
> Verzonden: maandag 19 januari 2009 15:16
> Aan: r-help at r-project.org
> Onderwerp: [R] reference category for factor in regression
>
> Hi all,
>
> I am struggling with a strange issue in R that I have not encountered
> before and I am not sure how to resolve this.
>
> The model looks like this, with all irrelevant variables left out:
>
> LABOUR - a dummy variable
> NONLABOUR = 1 - LABOUR
> AGE - a categorical variable / factor
> VOTE - a dummy variable
>
> glm(VOTE ~ 0 + LABOUR + NONLABOUR + LABOUR : AGE + NONLABOUR : AGE,
> family=binomial(link="logit"))
>
> In other words, a standard interaction model, but I want to know the
> intercepts and coefficients for each of the two cases (LABOUR and
> NONLABOUR), instead of getting coefficients for the differences as in
> a normal interaction model.
>
> But the strange thing is, for the two occurances of the AGE variable,
> it makes a different choice as to which AGE category to leave out of
> the regression. The cross-table of AGE with LABOUR does not have empty
> cells.
>
> Anyone any idea what might be going wrong? Or what I could do about
> this?
>
> Thanks in advance for any help!
>
> Regards,
>
> Jos
>
> --
> Johan A. Elkink
> Lecturer
> School of Politics and International Relations & CHS Graduate School
> University College Dublin
> Ph. +353 1 716 7026  |  Library Building, Rm 512
> http://jaeweb.cantr.net
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
> en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
> door een geldig ondertekend document. The views expressed in  this message
> and any annex are purely those of the writer and may not be regarded as stating
> an official position of INBO, as long as the message is not confirmed by a duly
> signed document.
>



-- 
Johan A. Elkink
Lecturer
School of Politics and International Relations & CHS Graduate School
University College Dublin
Ph. +353 1 716 7026  |  Library Building, Rm 512
http://jaeweb.cantr.net




More information about the R-help mailing list