[R] factor() in lm

Bert Gunter gunter.berton at gene.com
Sun Dec 1 19:27:54 CET 2013


You may wish to talk to a local statistician or read up on linear
models, as you appear to not understand some basics. Anyway,  either

1. You have other covariates in your model that you haven't shown and
your model is overdetermined.
2. You have NA's in your data that causes 1) to occur.

As an example of the above:

x <- rep(letters[1:3],e=5)
y <- factor(rep(1:3,c(5,8,2)))
summary(lm(rnorm(15)~x+y))

Call:
lm(formula = rnorm(15) ~ x + y)

Residuals:
    Min      1Q  Median      3Q     Max
-1.6768 -0.3865 -0.1108  0.3090  1.9632

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.04138    0.47160   0.088    0.932
xb           1.59259    1.17111   1.360    0.201
xc           0.36822    0.88228   0.417    0.684
y2          -1.58517    0.96264  -1.647    0.128
y3                NA         NA      NA       NA


Incidentally, I was surprised to find in R3.0.2 that if some levels of
a factor are missing either due to NA's in the response or otherwise,
R estimates the coefficients for the remaining factor levels quite
nicely. I expected it to complain, but it did not. Maybe it has always
been so nicely behaved -- I don't fit overdetermined models and take
care that my factor levels are actually present, so don't run into
trouble. But if this is newish behavior and you are using an oldish
version, you might try upgrading to the current version. Or (more
likely) both clauses of this conditional are false and should be
ignored, and I should preemptively apologize for my foolishness.

Cheers,
Bert

On Sun, Dec 1, 2013 at 9:48 AM, Gary Dong <pdxgary163 at gmail.com> wrote:
> Dear R users,
>
> I am running a linear regression in R. My observations are Census Tracts in
> several metropolitan areas (MSAs). In my data set, each MSA has at least 50
> observations. I use factor(msa_code) in the lm formula to control for
> metropolitan fixed effects. But I kept getting something like this:
>
> .....
> factor(msa_code)12420  4.910e-01  1.517e-01   3.237 0.001221 **
> factor(msa_code)12580  1.966e-01  6.861e-02   2.865 0.004194 **
> factor(msa_code)14460 -3.892e-02  1.653e-02  -2.355 0.018601 *
> factor(msa_code)16980 -2.873e-01  3.278e-02  -8.764  < 2e-16 ***
> factor(msa_code)17140  1.088e-01  6.771e-02   1.607 0.108127
> factor(msa_code)17460 -1.173e-01  4.380e-02  -2.678 0.007441 **
> factor(msa_code)19100  1.368e-01  5.550e-02   2.465 0.013753 *
> factor(msa_code)19740  5.819e-01  1.173e-01   4.962 7.33e-07 ***
> factor(msa_code)19820 -4.214e-01  6.641e-02  -6.346 2.51e-10 ***
> factor(msa_code)26420  1.258e-01  7.541e-02   1.668 0.095486 .
> factor(msa_code)28140  2.010e-01  3.847e-02   5.224 1.85e-07 ***
> factor(msa_code)29820  7.102e-02  6.593e-02   1.077 0.281435
> factor(msa_code)31100 -4.832e-01  1.088e-01  -4.440 9.28e-06 ***
> factor(msa_code)33100 -2.534e-01  6.391e-02  -3.965 7.49e-05 ***
> factor(msa_code)33460  5.229e-02  7.891e-02   0.663 0.507609
> factor(msa_code)35620 -3.197e-01  7.565e-02  -4.225 2.45e-05 ***
> factor(msa_code)36740  1.269e-01  6.948e-02   1.826 0.067868 .
> factor(msa_code)37980  1.394e-01  4.388e-02   3.178 0.001497 **
> factor(msa_code)38060 -6.935e-02  6.124e-02  -1.132 0.257540
> factor(msa_code)38300  1.647e-01  3.986e-02   4.133 3.67e-05 ***
> factor(msa_code)38900  2.605e-01  1.420e-01   1.835 0.066664 .
> factor(msa_code)39300 -9.612e-02  4.704e-02  -2.043 0.041103 *
> factor(msa_code)40140 -2.353e-01  3.562e-02  -6.605 4.59e-11 ***
> factor(msa_code)40900         NA         NA      NA       NA
> factor(msa_code)41740         NA         NA      NA       NA
> factor(msa_code)41860         NA         NA      NA       NA
> factor(msa_code)42660         NA         NA      NA       NA
> factor(msa_code)45300         NA         NA      NA       NA
> factor(msa_code)47900         NA         NA      NA       NA
>
>  I wonder why I kep getting those "NAs". Thank you!
>
> Gary
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

(650) 467-7374



More information about the R-help mailing list