[R] lm() silently drops NAs

Andrew Robinson A.Robinson at ms.unimelb.edu.au
Wed Jul 27 04:47:55 CEST 2016


Agh.  I've argued elsewhere that the default behaviour should be to
fail, and the user should take the responsibility to explicitly handle
the missing values, even if that simply be by changing the argument.
Probably Peter and I have different experiences with the completeness
of datasets, but anything that encourages me not to think about what
I'm doing in that realm seems like a bad idea.

Best wishes,

Andrew

On 27 July 2016 at 07:30, peter dalgaard <pdalgd at gmail.com> wrote:
>
>> On 26 Jul 2016, at 22:26 , Hadley Wickham <h.wickham at gmail.com> wrote:
>>
>> On Tue, Jul 26, 2016 at 3:24 AM, Martin Maechler
>> <maechler at stat.math.ethz.ch> wrote:
>>>
> ...
>> To me, this would be the most sensible default behaviour, but I
>> realise it's too late to change without breaking many existing
>> expectations.
>
> Probably.
>
> Re. the default choice, my recollection is that at the time the only choices available were na.omit and na.fail. S-PLUS was using na.fail for all the usual good reasons, but from a practical perspective, the consequence was that, since almost every data set has NA values,  you got an error unless you added na.action=na.omit to every single lm() call. And habitually typing na.action=na.omit doesn't really solve any of the issues with different models being fit to different subsets and all that. So the rationale for doing it differently in R was that it was better to get some probably meaningful output rather than to be certain of getting nothing. And, that was what the mainstream packages of the time were doing.
>
>> On a related note, I've never really understood why it's called
>> na.exclude - from my perspective it causes the _inclusion_ of missing
>> values in the predictions/residuals.
>
> I think the notion is that you exclude them from the analysis, but keep them around for the other purposes.
>
> -pd
>
>>
>> Thanks for the (as always!) informative response, Martin.
>>
>> Hadley
>>
>> --
>> http://hadley.nz
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Andrew Robinson
Deputy Director, CEBRA, School of Biosciences
Reader & Associate Professor in Applied Statistics  Tel: (+61) 0403 138 955
School of Mathematics and Statistics                        Fax: +61-3-8344 4599
University of Melbourne, VIC 3010 Australia
Email: a.robinson at ms.unimelb.edu.au
Website: http://www.ms.unimelb.edu.au/~andrewpr

MSME: http://www.crcpress.com/product/isbn/9781439858028
FAwR: http://www.ms.unimelb.edu.au/~andrewpr/FAwR/
SPuR: http://www.ms.unimelb.edu.au/spuRs/



More information about the R-help mailing list