[R] Dummy variables or factors?

andrew andrewjohnroyal at gmail.com
Wed Oct 21 05:58:05 CEST 2009


Sorry for this third posting - the second method is the same as the
first after all: the coefficients of the first linear model *is* a
linear transformation of the second.  Just got confused with the
pasting, tis all.


On Oct 21, 2:51 pm, andrew <andrewjohnro... at gmail.com> wrote:
> Oh dear, that doesn't look right at all.  I shall have a think about
> what I did wrong and maybe follow my own advice and consult the doco
> myself!
>
> On Oct 21, 2:45 pm, andrew <andrewjohnro... at gmail.com> wrote:
>
>
>
> > The following is *significantly* easier to do than try and add in
> > dummy variables, although the dummy variable approach is going to give
> > you exactly the same answer as the factor method, but possibly with a
> > different baseline.
>
> > Basically, you might want to search the lm help and possibly consult a
> > stats book on information about how the design matrix is constructed
> > in both cases.
>
> > > xF <- factor(1:10)
> > > N <- 1000
> > > xFs <- sample(x=xF,N,replace = T)
> > > yFs <- rnorm(N, mean = as.numeric(xFs))
> > > lm(yFs ~ xFs)
>
> > Call:
> > lm(formula = yFs ~ xFs)
>
> > Coefficients:
> > (Intercept)         xFs2         xFs3         xFs4
> > xFs5         xFs6         xFs7         xFs8
> >      0.7845       1.1620       2.1474       3.1391       4.2183
> > 5.2621       6.0814       7.4170
> >        xFs9        xFs10
> >      8.2193       9.2987
>
> > > lm(yFs ~ diag(10)[,1:9][xFs,])
>
> > Call:
> > lm(formula = yFs ~ diag(10)[, 1:9][xFs, ])
>
> > Coefficients:
> >             (Intercept)  diag(10)[, 1:9][xFs, ]1  diag(10)[, 1:9]
> > [xFs, ]2  diag(10)[, 1:9][xFs, ]3
> >                  10.083                   -9.299
> > -8.137                   -7.151
> > diag(10)[, 1:9][xFs, ]4  diag(10)[, 1:9][xFs, ]5  diag(10)[, 1:9]
> > [xFs, ]6  diag(10)[, 1:9][xFs, ]7
> >                  -6.160                   -5.080
> > -4.037                   -3.217
> > diag(10)[, 1:9][xFs, ]8  diag(10)[, 1:9][xFs, ]9
> >                  -1.882                   -1.079
>
> > On Oct 21, 9:44 am, David Winsemius <dwinsem... at comcast.net> wrote:
>
> > > On Oct 20, 2009, at 4:00 PM, Luciano La Sala wrote:
>
> > > > Dear R-people,
>
> > > > I am analyzing epidemiological data using GLMM using the lmer  
> > > > package. I usually explore the assumption of linearity of continuous  
> > > > variables in the logit of the outcome by creating 4 categories of  
> > > > the variable, performing a bivariate logistic regression, and then  
> > > > plotting the coefficients of each category against their mid points.  
> > > > That gives me a pretty good idea about the linearity assumption and  
> > > > possible departures from it.
>
> > > > I know of people who create 0,1 dummy variables in order to relax  
> > > > the linearity assumption. However, I've read that dummy variables  
> > > > are never needed (nor are desireble) in R! Instead, one should make  
> > > > use of factors variable. That is much easier to work with than dummy  
> > > > variables and the model itself will create the necessary dummy  
> > > > variables.
>
> > > > Having said that, if my data violates the linearity assumption, does  
> > > > the use of a factors for the variable in question helps overcome the  
> > > > lack of linearity?
>
> > > No. If done by dividing into samall numbers of categories after  
> > > looking at the data, it merely creates other (and probably more  
> > > severe) problems. If you are in the unusal (although desirable)  
> > > position of having a large number of events across the range of the  
> > > covariates in your data, you may be able to cut your variable into  
> > > quintiles or deciles and analyze the resulting factor, but the  
> > > preferred approach would be to fit a regression spline of sufficient  
> > > complexity.
>
> > > > Thanks in advance.
>
> > > --
>
> > > David Winsemius, MD
> > > Heritage Laboratories
> > > West Hartford, CT
>
> > > ______________________________________________
> > > R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
>
> > ______________________________________________
> > R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list