[R] differing behavior of mean(), median() and sd() with na.rm

Ivan Calandra c@|@ndr@ @end|ng |rom rgzm@de
Thu Aug 23 08:15:35 CEST 2018


Thanks all for the enlightenment.

So, it does make sense that mean() produces NaN and median()/sd() NA, 
from a calculation point of view at least.
But I still think it also makes sense that the mean of NA is NA as well, 
be it only for consistency with other functions. That's just my opinion 
of course. I can still convert NaN to NA at the end if I need to.

Best,
Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 22/08/2018 18:41, Ted Harding wrote:
> I think that one can usefully look at this question from the
> point of view of what "NaN" and "NA" are abbreviations for
> (at any rate, according to the understanding I have adopted
> since many years -- maybe over-simplified).
>
> NaN: Mot a Number
> NA: Not Available
>
> So NA is typically used for missing values, whereas NaN
> represents the reults of numerical calculations which
> cannot give a result which is a definite number,
>
> Hence 0/0 is not a number, so NaN; similarly Inf/Inf.
>
> Thus, with your x <- c(NA, NA, NA) mean(x, na.rm=TRUE)
> sum(x, na.rm=TRUE) = 0, since the set of values of x
> with na.rm=TRUE is empty so the number of elements
> in x is 0; hence mean = 0/0 = NaN.
>
> But for median(x, na.rm=TRUE), because there are no available
> elements in x with na.rm=TRUE, and the median is found by
> searching among available elements for the value which
> divides the set of values into two halves, the median
> is not available, hence NA.
>
> Best wishes to all,
> Ted.
>
> On Wed, 2018-08-22 at 11:24 -0400, Marc Schwartz via R-help wrote:
>> Hi,
>>
>> It might even be worthwhile to review this recent thread on R-Devel:
>>
>>    https://stat.ethz.ch/pipermail/r-devel/2018-July/076377.html
>>
>> which touches upon a subtly related topic vis-a-vis NaN handling.
>>
>> Regards,
>>
>> Marc Schwartz
>>
>>
>>> On Aug 22, 2018, at 10:55 AM, Bert Gunter <bgunter.4567 using gmail.com> wrote:
>>>
>>> ... And FWIW (not much, I agree), note that if z = numeric(0) and sum(z) =
>>> 0, then mean(z) = NaN makes sense, as length(z) = 0, so dividing by 0 gives
>>> NaN. So you can see the sorts of issues you may need to consider.
>>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming along and
>>> sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>> On Wed, Aug 22, 2018 at 7:47 AM Bert Gunter <bgunter.4567 using gmail.com> wrote:
>>>
>>>> Actually, the dissonance is a bit more basic.
>>>>
>>>> After xxx(...., na.rm=TRUE) with all NA's in ... you have numeric(0). So
>>>> what you see is actually:
>>>>
>>>>> z <- numeric(0)
>>>>> mean(z)
>>>> [1] NaN
>>>>> median(z)
>>>> [1] NA
>>>>> sd(z)
>>>> [1] NA
>>>>> sum(z)
>>>> [1] 0
>>>> etc.
>>>>
>>>> I imagine that there may be more of these little inconsistencies due to
>>>> the organic way R evolved over time. What the conventions should be  can be
>>>> purely a matter of personal opinion in the absence of accepted standards.
>>>> But I would look to see what accepted standards were, if any, first.
>>>>
>>>> -- Bert
>>>>
>>>>
>>>> On Wed, Aug 22, 2018 at 7:34 AM Ivan Calandra <calandra using rgzm.de> wrote:
>>>>
>>>>> Dear useRs,
>>>>>
>>>>> I have just noticed that when input is only NA with na.rm=TRUE, mean()
>>>>> results in NaN, whereas median() and sd() produce NA. Shouldn't it all
>>>>> be the same? I think NA makes more sense than NaN in that case.
>>>>>
>>>>> x <- c(NA, NA, NA) mean(x, na.rm=TRUE) [1] NaN median(x, na.rm=TRUE) [1]
>>>>> NAsd(x, na.rm=TRUE) [1] NA
>>>>>
>>>>> Thanks for any feedback.
>>>>>
>>>>> Best,
>>>>> Ivan
>>>>>
>>>>> --
>>>>> Dr. Ivan Calandra
>>>>> TraCEr, laboratory for Traceology and Controlled Experiments
>>>>> MONREPOS Archaeological Research Centre and
>>>>> Museum for Human Behavioural Evolution
>>>>> Schloss Monrepos
>>>>> 56567 Neuwied, Germany
>>>>> +49 (0) 2631 9772-243
>>>>> https://www.researchgate.net/profile/Ivan_Calandra
>>>>>
>>>>> ______________________________________________
>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>> 	[[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list