[R] reference category for factor in regression

ONKELINX, Thierry Thierry.ONKELINX at inbo.be
Mon Jan 19 15:37:02 CET 2009


Dear Jos,

In R you don't need to create you own dummy variables. Just create a
factor variable LABOUR (with two levels) and rerun your model. Then you
should be able to calculate all coefficients.

HTH,

Thierry

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be 
www.inbo.be 

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
Namens Jos Elkink
Verzonden: maandag 19 januari 2009 15:16
Aan: r-help at r-project.org
Onderwerp: [R] reference category for factor in regression

Hi all,

I am struggling with a strange issue in R that I have not encountered
before and I am not sure how to resolve this.

The model looks like this, with all irrelevant variables left out:

LABOUR - a dummy variable
NONLABOUR = 1 - LABOUR
AGE - a categorical variable / factor
VOTE - a dummy variable

glm(VOTE ~ 0 + LABOUR + NONLABOUR + LABOUR : AGE + NONLABOUR : AGE,
family=binomial(link="logit"))

In other words, a standard interaction model, but I want to know the
intercepts and coefficients for each of the two cases (LABOUR and
NONLABOUR), instead of getting coefficients for the differences as in
a normal interaction model.

But the strange thing is, for the two occurances of the AGE variable,
it makes a different choice as to which AGE category to leave out of
the regression. The cross-table of AGE with LABOUR does not have empty
cells.

Anyone any idea what might be going wrong? Or what I could do about
this?

Thanks in advance for any help!

Regards,

Jos

-- 
Johan A. Elkink
Lecturer
School of Politics and International Relations & CHS Graduate School
University College Dublin
Ph. +353 1 716 7026  |  Library Building, Rm 512
http://jaeweb.cantr.net

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.




More information about the R-help mailing list