[R] Interquartile Range

Bert Gunter bgunter.4567 at gmail.com
Wed Apr 20 05:53:26 CEST 2016


???

IQR returns a single number.

> IQR(rnorm(10))
[1] 1.090168

To your 2nd response:
"I could have used average, min, max, they all would have returned the
same thing., "

I can only respond: huh?? Are all your values identical?

You really need to provide a small reproducible example as requested
by the posting guide -- I certainly don't get it, and I'm done
guessing. Maybe others will see what I am missing and say something
useful. I clearly can't.

Cheers,
Bert





Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Apr 19, 2016 at 5:29 PM, Michael Artz <michaeleartz at gmail.com> wrote:
> Again, IQR returns two both a .25 and a .75 value and it failed, which is
> why I didn't use it before. Also, the first function just returns tha same
> value repeating.  Since they are the same, before the second call, using the
> mode function is just a way to grab one value. I could have used average,
> min, max, they all would have returned the same thing.
>
> Mike
>
> On Tue, Apr 19, 2016 at 7:24 PM, Marc Schwartz <marc_schwartz at me.com> wrote:
>>
>> Hi,
>>
>> Jumping into this thread mainly on the point of the mode of the
>> distribution, while also supporting Bert's comments below on theory.
>>
>> If the vector 'x' that is being passed to this function is an integer
>> vector, then a tabulation of the integers can yield a 'mode', presuming of
>> course that there is only one unique mode. You may have to decide how you
>> want to handle a multi-modal discrete distribution.
>>
>> If the vector 'x' is continuous (e.g. contains floating point values),
>> then a tabulation is going to be problematic for a variety of reasons.
>>
>> In that case, prior discussions on this point, have yielded the following
>> estimation of the mode of a continuous distribution by using:
>>
>> Mode <- function(x) {
>>   D <- density(x)
>>   D$x[which.max(D$y)]
>> }
>>
>> where the second line of the function gets you the value of 'x' at the
>> maximum of the density estimate. Of course, there is still the possibility
>> of a multi-modal distribution and the nuances of which kernel is used, etc.,
>> etc.
>>
>> Food for thought.
>>
>> Regards,
>>
>> Marc Schwartz
>>
>>
>> > On Apr 19, 2016, at 7:07 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>> >
>> > Well, instead of your functions try:
>> >
>> > Mode <- function(x) {
>> >     tabx <- table(x)
>> >     tabx[which.max(tabx)]
>> > }
>> >
>> > and use R's IQR function instead of yours.
>> >
>> > ... so I still don't get why you want to return a character string
>> > instead of a value for the IQR;
>> > and the mode of a sample defined as above is generally a bad estimator
>> > of the mode of the distribution. To say more than that would take me
>> > too far afield. Post on stats.stackexchange.com if you want to know
>> > why (if it's even relevant).
>> >
>> > Cheers,
>> > Bert
>> > Bert Gunter
>> >
>> > "The trouble with having an open mind is that people keep coming along
>> > and sticking things into it."
>> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> >
>> >
>> > On Tue, Apr 19, 2016 at 4:25 PM, Michael Artz <michaeleartz at gmail.com>
>> > wrote:
>> >> Hi,
>> >>  Here is what I am doing
>> >>
>> >> notGroupedAll <- ddply(data
>> >>                 ,~groupColumn
>> >>                 ,summarise
>> >>                 ,col1_mean=mean(col1)
>> >>                 ,col2_mode=Mode(col2) #Function I wrote for getting the
>> >> mode shown below
>> >>                 ,col3_Range=myIqr(col3)
>> >>                 )
>> >>
>> >> groupedAll <- ddply(data
>> >>                 ,~groupColumn
>> >>                 ,summarise
>> >>                 ,col1_mean=mean(col1)
>> >>                 ,col2_mode=Mode(col2) #Function I wrote for getting the
>> >> mode shown below
>> >>                 ,col3_Range=Mode(col3)
>> >>                 )
>> >>
>> >> #custom Mode function
>> >> Mode <- function(x) {
>> >>  ux <- unique(x)
>> >>  ux[which.max(tabulate(match(x, ux)))]
>> >>
>> >> #the range function
>> >> myIqr <- function(x) {
>> >>  paste(round(quantile(x,0.375),0),round(quantile(x,0.625),0),sep="-")
>> >> }
>> >>
>> >>
>> >> }
>> >>
>> >>
>> >> Here is what I am doing!! :)
>> >>
>> >>
>> >>
>> >> On Tue, Apr 19, 2016 at 2:57 PM, William Dunlap <wdunlap at tibco.com>
>> >> wrote:
>> >>>
>> >>> If you show us, not just tell us about, a self-contained example
>> >>> someone might show you a non-hacky way of getting the job done.
>> >>> (I don't see an argument to plyr::ddply called 'transform'.)
>> >>>
>> >>> Bill Dunlap
>> >>> TIBCO Software
>> >>> wdunlap tibco.com
>> >>>
>> >>> On Tue, Apr 19, 2016 at 12:18 PM, Michael Artz
>> >>> <michaeleartz at gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> Oh thanks for that clarification Bert!  Hope you enjoyed your coffee!
>> >>>> I
>> >>>> ended up just using the transform argument in the ddply function.  It
>> >>>> worked
>> >>>> and it repeated, then I called a mode function in another call to
>> >>>> ddply that
>> >>>> summarised.  Kinda hacky but oh well!
>> >>>>
>> >>>> On Tue, Apr 19, 2016 at 12:31 PM, Bert Gunter
>> >>>> <bgunter.4567 at gmail.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> ... and I'm getting another cup of coffee...
>> >>>>>
>> >>>>> -- Bert
>> >>>>> Bert Gunter
>> >>>>>
>> >>>>> "The trouble with having an open mind is that people keep coming
>> >>>>> along
>> >>>>> and sticking things into it."
>> >>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> >>>>>
>> >>>>>
>> >>>>> On Tue, Apr 19, 2016 at 10:30 AM, Bert Gunter
>> >>>>> <bgunter.4567 at gmail.com>
>> >>>>> wrote:
>> >>>>>> NO NO  -- I am wrong! The paste() expression is of course
>> >>>>>> evaluated.
>> >>>>>> It's just that a character string is returned of the form
>> >>>>>> "something -
>> >>>>>> something".
>> >>>>>>
>> >>>>>> I apologize for the confusion.
>> >>>>>>
>> >>>>>> -- Bert
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> Bert Gunter
>> >>>>>>
>> >>>>>> "The trouble with having an open mind is that people keep coming
>> >>>>>> along
>> >>>>>> and sticking things into it."
>> >>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> >>>>>>
>> >>>>>>
>> >>>>>> On Tue, Apr 19, 2016 at 10:25 AM, Bert Gunter
>> >>>>>> <bgunter.4567 at gmail.com>
>> >>>>>> wrote:
>> >>>>>>> To be precise:
>> >>>>>>>
>> >>>>>>> paste(round(quantile(x,0.25),0),round(quantile(x,0.75),0),sep="-")
>> >>>>>>>
>> >>>>>>> is an expression that evaluates to a character string:
>> >>>>>>> "round(quantile(x,.25),0) - round(quantile(x,0.75),0)"
>> >>>>>>>
>> >>>>>>> no matter what the argument of your function, x. Hence
>> >>>>>>>
>> >>>>>>> return(paste(...)) will return this exact character string and
>> >>>>>>> never
>> >>>>>>> evaluates x.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Cheers,
>> >>>>>>> Bert
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Bert Gunter
>> >>>>>>>
>> >>>>>>> "The trouble with having an open mind is that people keep coming
>> >>>>>>> along
>> >>>>>>> and sticking things into it."
>> >>>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Tue, Apr 19, 2016 at 8:34 AM, William Dunlap via R-help
>> >>>>>>> <r-help at r-project.org> wrote:
>> >>>>>>>>> That didn't work Jim!
>> >>>>>>>>
>> >>>>>>>> It always helps to say how the suggestion did not work.  Jim's
>> >>>>>>>> function had a typo in it - was that the problem?  Or did you not
>> >>>>>>>> change the call to ddply to use that function.  Here is something
>> >>>>>>>> that might "work" for you:
>> >>>>>>>>
>> >>>>>>>> library(plyr)
>> >>>>>>>>
>> >>>>>>>> data <- data.frame(groupColumn=rep(1:5,1:5), col1=2^(0:14))
>> >>>>>>>> myIqr <- function(x) {
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> paste(round(quantile(x,0.25),0),round(quantile(x,0.75),0),sep="-")
>> >>>>>>>> }
>> >>>>>>>> ddply(data, ~groupColumn, summarise, col1_myIqr=myIqr(col1),
>> >>>>>>>> col1_IQR=stats::IQR(col1))
>> >>>>>>>> #  groupColumn col1_myIqr col1_IQR
>> >>>>>>>> #1           1        1-1        0
>> >>>>>>>> #2           2        2-4        1
>> >>>>>>>> #3           3      12-24       12
>> >>>>>>>> #4           4    112-320      208
>> >>>>>>>> #5           5  2048-8192     6144
>> >>>>>>>>
>> >>>>>>>> The important point is that
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> paste(round(quantile(x,0.25),0),round(quantile(x,0.75),0),sep="-")
>> >>>>>>>> is not a function, it is an expression.   ddplyr wants functions.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Bill Dunlap
>> >>>>>>>> TIBCO Software
>> >>>>>>>> wdunlap tibco.com
>> >>>>>>>>
>> >>>>>>>> On Tue, Apr 19, 2016 at 7:56 AM, Michael Artz
>> >>>>>>>> <michaeleartz at gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> That didn't work Jim!
>> >>>>>>>>>
>> >>>>>>>>> Thanks anyway
>> >>>>>>>>>
>> >>>>>>>>> On Mon, Apr 18, 2016 at 9:02 PM, Jim Lemon
>> >>>>>>>>> <drjimlemon at gmail.com>
>> >>>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> Hi Michael,
>> >>>>>>>>>> At a guess, try this:
>> >>>>>>>>>>
>> >>>>>>>>>> iqr<-function(x) {
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> return(paste(round(quantile(x,0.25),0),round(quantile(x,0.75),0),sep="-")
>> >>>>>>>>>> }
>> >>>>>>>>>>
>> >>>>>>>>>> .col3_Range=iqr(datat$tenure)
>> >>>>>>>>>>
>> >>>>>>>>>> Jim
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Tue, Apr 19, 2016 at 11:15 AM, Michael Artz
>> >>>>>>>>>> <michaeleartz at gmail.com>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>> Hi,
>> >>>>>>>>>>>  I am trying to show an interquartile range while grouping
>> >>>>>>>>>>> values
>> >>>>>>>>> using
>> >>>>>>>>>>> the function ddply().  So my function call now is like
>> >>>>>>>>>>>
>> >>>>>>>>>>> groupedAll <- ddply(data
>> >>>>>>>>>>>                 ,~groupColumn
>> >>>>>>>>>>>                 ,summarise
>> >>>>>>>>>>>                 ,col1_mean=mean(col1)
>> >>>>>>>>>>>                 ,col2_mode=Mode(col2) #Function I wrote for
>> >>>>>>>>>>> getting
>> >>>>>>>>> the
>> >>>>>>>>>>> mode shown below
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> ,col3_Range=paste(as.character(round(quantile(datat$tenure,c(.25)))),
>> >>>>>>>>>>> as.character(round(quantile(data$tenure,c(.75)))), sep = "-")
>> >>>>>>>>>>>                 )
>> >>>>>>>>>>>
>> >>>>>>>>>>> #custom Mode function
>> >>>>>>>>>>> Mode <- function(x) {
>> >>>>>>>>>>>  ux <- unique(x)
>> >>>>>>>>>>>  ux[which.max(tabulate(match(x, ux)))]
>> >>>>>>>>>>> }
>> >>>>>>>>>>>
>> >>>>>>>>>>> I am not sre what is going wrong on my interquartile range
>> >>>>>>>>>>> function, it
>> >>>>>>>>>>> works on its own outside of ddply()
>> >>>>>>>>>>>
>> >>>>>>>>>>>        [[alternative HTML version deleted]]
>> >>>>>>>>>>>
>> >>>>>>>>>>> ______________________________________________
>> >>>>>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
>> >>>>>>>>>>> see
>> >>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>>>>>>>>>> PLEASE do read the posting guide
>> >>>>>>>>>> http://www.R-project.org/posting-guide.html
>> >>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>> >>>>>>>>>>> code.
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>        [[alternative HTML version deleted]]
>> >>>>>>>>>
>> >>>>>>>>> ______________________________________________
>> >>>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
>> >>>>>>>>> see
>> >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>>>>>>>> PLEASE do read the posting guide
>> >>>>>>>>> http://www.R-project.org/posting-guide.html
>> >>>>>>>>> and provide commented, minimal, self-contained, reproducible
>> >>>>>>>>> code.
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>        [[alternative HTML version deleted]]
>> >>>>>>>>
>> >>>>>>>> ______________________________________________
>> >>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>>>>>>> PLEASE do read the posting guide
>> >>>>>>>> http://www.R-project.org/posting-guide.html
>> >>>>>>>> and provide commented, minimal, self-contained, reproducible
>> >>>>>>>> code.
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list