[Rd] lm considers removed predictors when finding complete cases

David Winsemius dwinsemius at comcast.net
Wed Dec 20 01:22:57 CET 2017


> On Dec 19, 2017, at 11:12 AM, EDUARDO GARCIA PORTUGUES <edgarcia at est-econ.uc3m.es> wrote:
> 
> Dear R-devel list,
> 
> I realized that removing a predictor in lm through the "-"'s operator in
> formula() does not affect the complete cases that are considered. A minimal
> example is:
> 
> summary(lm(Wind ~ ., data = airquality))
> # 42 observations deleted due to missingness
> 
> summary(lm(Wind ~ . - Ozone, data = airquality))
> # still 42 observations deleted due to missingness, even if only 7 are
> # missing for the response and the rest of the predictors
> 
> summary(lm(Wind ~ ., data = subset(airquality, select = -Ozone)))
> # 7 observations deleted due to missingness
> 
> I find this behaviour somehow striking and I was wondering whether it is
> intended, or whether it would be appropriate to document it in lm's help.

The behavior in the second instance seems consistent with a desire to compare models (full versus reduced) based on the same data. You expectation appears to be something else but you have not really explained your rationale for a different expectation other than to call it "striking". If by "striking" you mean hitting your head and saying "Oh course, I should have thought of that" then we would be in agreement.

-- 
David.
> 
> Any insight on this issue is appreciated.
> 
> Best regards,
> -- 
> Eduardo García Portugués
> Assistant professor
> Department of Statistics
> Carlos III University of Madrid
> 
> Office: 7.3.J21 (Leganés)
> Phone: (+34) 91624 8836
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law



More information about the R-devel mailing list