[R] Linear Model and Missing Data in Predictors
wdunlap at tibco.com
Tue Mar 15 17:47:07 CET 2016
One technique for dealing with this is called 'multiple imputation'.
Google for 'multiple imputation in R' to find R packages that implement
it (e.g., the 'mi' package).
On Tue, Mar 15, 2016 at 8:14 AM, Lorenzo Isella <lorenzo.isella at gmail.com>
> Dear All,
> A situation that for sure happens very often: suppose you are in the
> following situation
> x1 <- seq(30)
> x2 <- c(rep(NA, 9), rnorm(19)+9, c(NA, NA))
> x3 <- c(rnorm(17)-2, rep(NA, 13))
> y <- exp(seq(1,5, length=30))
> i.e. you try a simple linear regression with multiple regressors
> which exhibit some missing values.
> This is what happens to me while working with some time series which I
> use as regressors and whose missing values are padded with NAs.
> lm, as a default, disregard the sets of incomplete observations and
> therefore drops quite a lot of data.
> Is there any way to circumvent this? I mean, is there a way to somehow
> come up with a piecewise linear regression where, whenever possible,
> all the 3 regressors are used but we switch to 1 or 2 when there are
> missing data?
> I say this because it is totally unfeasible to try to figure out the
> values of the missing data in my regressors, but at the same time I
> cannot restrict my model to the intersection of the non-NA values in
> the 3 regressors. If this makes sense, do I have to code it myself or
> is there any package which already implemented this?
> Any suggestion is appreciated.
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
More information about the R-help