[R] Linear Model and Missing Data in Predictors

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Tue Mar 15 17:36:28 CET 2016


IMHO this is not a question about R... it is a question about statistics whether R is involved or not. As such, a forum like stats.stackexchange.com would be better suited to address this.

FWIW I happen to think that expecting R to solve this for you is unreasonable. 
-- 
Sent from my phone. Please excuse my brevity.

On March 15, 2016 8:14:42 AM PDT, Lorenzo Isella <lorenzo.isella at gmail.com> wrote:
>Dear All,
>A situation that for sure happens very often: suppose you are in the
>following situation
>
>set.seed(1235)
>x1 <- seq(30)
>x2 <- c(rep(NA, 9), rnorm(19)+9, c(NA, NA))
>x3 <- c(rnorm(17)-2, rep(NA, 13))
>
>y <- exp(seq(1,5, length=30))
>
>
>mm<-lm(y~x1+x2+x3)
>
>i.e. you try a simple linear regression with multiple regressors
>which exhibit some missing values.
>This is what happens to me while working with some time series which I
>use as regressors and whose missing values are padded with NAs.
>lm, as a default, disregard the sets of incomplete observations and
>therefore drops quite a lot of data.
>Is there any way to circumvent this? I mean, is there a way to somehow
>come up with a piecewise linear regression where, whenever possible,
>all the 3 regressors are used but we switch to 1 or 2 when there are
>missing data?
>I say this because it is totally unfeasible to try to figure out the
>values of the missing data in my regressors, but at the same time I
>cannot restrict my model to the intersection of the non-NA values in
>the 3 regressors. If this makes sense, do I have to code it myself or
>is there any package which already implemented this?
>Any suggestion is appreciated.
>Cheers
>
>Lorenzo
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]



More information about the R-help mailing list