[R] lm and levels

Daniel Malter daniel at umd.edu
Wed Nov 11 07:41:36 CET 2009


Hi, your problem is that you run a regression on three observations with
independent variables. This is obviously nonsense as two independent
variables plus intercept MUST perfectly explain all variable. This you can
see from the fact that the r-squared in the regression is 1, and that the
standard errors are NAs. If you include just one more observation, you don't
get that.

x <- c(2,4,2,4,3,6)
y <- c(4,9,5,5,10,7)
z <- factor(c(1,1,1,2,2,NA))
summary(lm("y ~ x + z"))

If you want to retain NAs (provided that you have more data points than
above, because otherwise the analysis remains bogus), you can use what is
called "missing variable coding." Code all NAs in z 0 and create an
indicator variable, say z.miss, that is 1 where z is NA and 0 otherwise.

Below is an example where z2 is the recoded z and z.miss is the indicator
variable that z is missing.

x=rnorm(100,0,1)
zee=c(rep(0,50),rep(1,50))
z=c(zee[0:80],rep(NA,20))
e=rnorm(100,0,2)

y=x+2*zee+e

reg1=lm(y~x+z)
summary(reg1)

z.miss=ifelse(is.na(z)==T,1,0)
z2=ifelse(is.na(z)==T,0,z)

data.frame(y,x,z,z2,z.miss)

reg2=lm(y~x+z2+z.miss)
summary(reg2) 

HTH,
Daniel

-------------------------
cuncta stricte discussurus
-------------------------

-----Ursprüngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
Auftrag von Chuck White
Gesendet: Wednesday, November 11, 2009 12:28 AM
An: r-help at r-project.org
Betreff: [R] lm and levels

Consider the following example:

x <- c(2,4,3,6)
y <- c(4,9,5,10)
z <- factor(c(1,1,2,2))
summary(lm("y ~ x + z"))

The above works fine.

Suppose I change z so that
x <- c(2,4,3,6)
y <- c(4,9,5,10)
z <- factor(c(1,1,2,NA))
summary(lm("y ~ x + z"))

the last row/observation is not considered in the regression. I would like
this to be treated as z with two levels "1" and "2" which are both in the
regression model (rather than dropping one of them which would be the case
in the first example). The last row would have 0 for z1 and z2.  How can
that be achieved?  THANKS.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list