[R] strange behaviour of median

Peter Ehlers ehlers at ucalgary.ca
Thu Feb 4 11:31:51 CET 2010


Petr PIKAL wrote:
> Hi
> 
> so do you think I shall fire a bug announcement? I think I rather wait to 
> see if there is some reaction from others. Maybe, there is some reason 
> behind such behaviour. Those simple statistics tend to behave differently 
> when operating on data.frames so median is not such a huge surprise.
> 
> see
> 
> sd(df1), var(df1), mean(df1), max(df1), min(df1), range(df1)
> 
> Produced results are usually clearly documented, however for novice it is 
> rather mysterious why using those functions on vector produce easily 
> understandable results but using them on data.frame (which is most common 
> structure of data) is far from consistent and intuitive.
> 
> But I agree with you that mean and median in best case shall give similar 
> results regarding results structure.
> 
> Regards
> Petr

Well, I don't think that it's a bug since the documentation
for median() does not indicate that median should work for
dataframes, whereas for mean() it clearly says that a method
exists. methods('mean') and methods('median') as well as
mean.default(df1) are informative.

It seems to me to be a simple fix so I wonder what I'm
missing. Paraphrasing mean.data.frame:

median.data.frame <- function(x, ...) sapply(x, median, ...)

I think that it would be desirable to have similar behaviour
for both functions or at least a warning if median.default
is incorrectly applied to a data.frame object.

  -Peter Ehlers

> 
> r-help-bounces at r-project.org napsal dne 04.02.2010 10:28:16:
> 
>> Well, I get the same as Petr with  R version 2.10.0 (2009-10-26)
>> on Linux.
>>
>> To me, this suggests that median is broken! Any user would,
>> a priori, expect that median() should operate in exactly
>> the same way as mean(). To extend Petr's example:
>>
>>   mat <- matrix(1:32, 4,8)
>>   df1 <- data.frame(mat)
>>   mean(df1)
>>   #   X1   X2   X3   X4   X5   X6   X7   X8 
>>   #  2.5  6.5 10.5 14.5 18.5 22.5 26.5 30.5 
>>   median(df1)
>>   # [1] 14.5 18.5
>>
>> so (as in Petr's original example, but more clearly) median()
>> returns the medians of the two "central" columns X4 and X5 of df1.
>>
>> But that is with an even number of columns. Now look at what
>> happens with an odd number:
>>
>>   mat <- matrix(1:28, 4,7)
>>   df1 <- data.frame(mat)
>>   mean(df1)
>>   #   X1   X2   X3   X4   X5   X6   X7 
>>   #  2.5  6.5 10.5 14.5 18.5 22.5 26.5 
>>   median(df1)
>>   #   structure(c("13", "14", "15", "16"), class = "AsIs")
>>   # 1                                                   13
>>   # 2                                                   14
>>   # 3                                                   15
>>   # 4                                                   16
>>
>> Wow!!!!!!!!!!
>>
>> This does suggest a tie-in with Petr's observation about "As.Is",
>> and there is no doubt at all that the above result is rubbish.
>> It is certainly not what a user would expect, and in the context
>> of Petr's intention to present R lessons to a class, I could
>> foresee students turning their backs on R if they came up with
>> such a result in their early encounters!
>>
>> Ted.
>>
>> On 04-Feb-10 08:59:59, Mario Valle wrote:
>>> Linux 2.9.0 gives:
>>>
>>>> median(df1)
>>> [1] 34
>>>
>>> Ever stranger...
>>>               mario
>>>
>>> Petr PIKAL wrote:
>>>> During some experimentation in preparing R lessons I encountered this 
> 
>>>> behaviour which I can not explain fully
>>>>
>>>> mat <- matrix(1:16, 4,4)
>>>> df1 <- data.frame(mat)
>>>>
>>>>> mean(df1)
>>>>   X1   X2   X3   X4 
>>>>  2.5  6.5 10.5 14.5 
>>>>
>>>> Expected, documented
>>>>
>>>>> median(df1)
>>>> [1]  6.5 10.5
>>>>
>>>> Rather weird, AFAIK there shall not be an issue with data frame at
>>>> least I 
>>>> did not find any in help page. I tracked it down probably to an As.Is 
> 
>>>> operation with object and subsequent sorting in median.default.
>>>>
>>>> I know other (*apply) ways how to compute median for data frames so I
>>>> just 
>>>> would like to hear an opinion about this behaviour from more
>>>> experienced 
>>>> people.
>>>>
>>>> Thank you
>>>> Best regards
>>>>
>>>> Petr
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>> -- 
>>> Ing. Mario Valle
>>> Data Analysis and Visualization Group            |
>>> http://www.cscs.ch/~mvalle
>>> Swiss National Supercomputing Centre (CSCS)      | Tel:  +41 (91)
>>> 610.82.60
>>> v. Cantonale Galleria 2, 6928 Manno, Switzerland | Fax:  +41 (91)
>>> 610.82.82
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> --------------------------------------------------------------------
>> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
>> Fax-to-email: +44 (0)870 094 0861
>> Date: 04-Feb-10                                       Time: 09:28:13
>> ------------------------------ XFMail ------------------------------
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
Peter Ehlers
University of Calgary



More information about the R-help mailing list