[R] Predictably puzzled.

Rolf Turner r@turner @end|ng |rom @uck|@nd@@c@nz
Sat Nov 20 03:12:14 CET 2021


Consider the following toy example:

    set.seed(42)
    y <- rnorm(20)
    x <- rnorm(20)
    y[c(3,5,14,15)] <- NA
    fit <- lm(y~x)
    predict(fit)

This for some reason, which escapes me, does not provide predicted
values when the response/dependent variable is missing, despite
there being no missing values in the predictor/independent variable.

I can get predicted values for all values of x if I set

    ddd <- data.frame(y=y,x=x)

and execute

    predict(fit,newdata=ddd)

Note that y is (unnecessarily) included in ddd.  I thought that
predict() might omit any rows of the data in which there are missing
values, but not so.

OK.  I have a workaround which gives me the predicted values that I
want, but:

(a) Why does predict() behave in this way?  It makes no sense to me,
but I figure there *must* be a rationale.

(b) Is there a way of getting predict() to behave as I would like, by
specifying an appropriate value for na.action?  I could not find such
an appropriate value.

Thanks for any enlightenment.

cheers,

Rolf Turner

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276



More information about the R-help mailing list