[R] Interquartile Range

Michael Artz michaeleartz at gmail.com
Wed Apr 20 02:29:34 CEST 2016


Again, IQR returns two both a .25 and a .75 value and it failed, which is
why I didn't use it before. Also, the first function just returns tha same
value repeating.  Since they are the same, before the second call, using
the mode function is just a way to grab one value. I could have used
average, min, max, they all would have returned the same thing.

Mike

On Tue, Apr 19, 2016 at 7:24 PM, Marc Schwartz <marc_schwartz at me.com> wrote:

> Hi,
>
> Jumping into this thread mainly on the point of the mode of the
> distribution, while also supporting Bert's comments below on theory.
>
> If the vector 'x' that is being passed to this function is an integer
> vector, then a tabulation of the integers can yield a 'mode', presuming of
> course that there is only one unique mode. You may have to decide how you
> want to handle a multi-modal discrete distribution.
>
> If the vector 'x' is continuous (e.g. contains floating point values),
> then a tabulation is going to be problematic for a variety of reasons.
>
> In that case, prior discussions on this point, have yielded the following
> estimation of the mode of a continuous distribution by using:
>
> Mode <- function(x) {
>   D <- density(x)
>   D$x[which.max(D$y)]
> }
>
> where the second line of the function gets you the value of 'x' at the
> maximum of the density estimate. Of course, there is still the possibility
> of a multi-modal distribution and the nuances of which kernel is used,
> etc., etc.
>
> Food for thought.
>
> Regards,
>
> Marc Schwartz
>
>
> > On Apr 19, 2016, at 7:07 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
> >
> > Well, instead of your functions try:
> >
> > Mode <- function(x) {
> >     tabx <- table(x)
> >     tabx[which.max(tabx)]
> > }
> >
> > and use R's IQR function instead of yours.
> >
> > ... so I still don't get why you want to return a character string
> > instead of a value for the IQR;
> > and the mode of a sample defined as above is generally a bad estimator
> > of the mode of the distribution. To say more than that would take me
> > too far afield. Post on stats.stackexchange.com if you want to know
> > why (if it's even relevant).
> >
> > Cheers,
> > Bert
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along
> > and sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> >
> > On Tue, Apr 19, 2016 at 4:25 PM, Michael Artz <michaeleartz at gmail.com>
> wrote:
> >> Hi,
> >>  Here is what I am doing
> >>
> >> notGroupedAll <- ddply(data
> >>                 ,~groupColumn
> >>                 ,summarise
> >>                 ,col1_mean=mean(col1)
> >>                 ,col2_mode=Mode(col2) #Function I wrote for getting the
> >> mode shown below
> >>                 ,col3_Range=myIqr(col3)
> >>                 )
> >>
> >> groupedAll <- ddply(data
> >>                 ,~groupColumn
> >>                 ,summarise
> >>                 ,col1_mean=mean(col1)
> >>                 ,col2_mode=Mode(col2) #Function I wrote for getting the
> >> mode shown below
> >>                 ,col3_Range=Mode(col3)
> >>                 )
> >>
> >> #custom Mode function
> >> Mode <- function(x) {
> >>  ux <- unique(x)
> >>  ux[which.max(tabulate(match(x, ux)))]
> >>
> >> #the range function
> >> myIqr <- function(x) {
> >>  paste(round(quantile(x,0.375),0),round(quantile(x,0.625),0),sep="-")
> >> }
> >>
> >>
> >> }
> >>
> >>
> >> Here is what I am doing!! :)
> >>
> >>
> >>
> >> On Tue, Apr 19, 2016 at 2:57 PM, William Dunlap <wdunlap at tibco.com>
> wrote:
> >>>
> >>> If you show us, not just tell us about, a self-contained example
> >>> someone might show you a non-hacky way of getting the job done.
> >>> (I don't see an argument to plyr::ddply called 'transform'.)
> >>>
> >>> Bill Dunlap
> >>> TIBCO Software
> >>> wdunlap tibco.com
> >>>
> >>> On Tue, Apr 19, 2016 at 12:18 PM, Michael Artz <michaeleartz at gmail.com
> >
> >>> wrote:
> >>>>
> >>>> Oh thanks for that clarification Bert!  Hope you enjoyed your
> coffee!  I
> >>>> ended up just using the transform argument in the ddply function.  It
> worked
> >>>> and it repeated, then I called a mode function in another call to
> ddply that
> >>>> summarised.  Kinda hacky but oh well!
> >>>>
> >>>> On Tue, Apr 19, 2016 at 12:31 PM, Bert Gunter <bgunter.4567 at gmail.com
> >
> >>>> wrote:
> >>>>>
> >>>>> ... and I'm getting another cup of coffee...
> >>>>>
> >>>>> -- Bert
> >>>>> Bert Gunter
> >>>>>
> >>>>> "The trouble with having an open mind is that people keep coming
> along
> >>>>> and sticking things into it."
> >>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>>>>
> >>>>>
> >>>>> On Tue, Apr 19, 2016 at 10:30 AM, Bert Gunter <
> bgunter.4567 at gmail.com>
> >>>>> wrote:
> >>>>>> NO NO  -- I am wrong! The paste() expression is of course evaluated.
> >>>>>> It's just that a character string is returned of the form
> "something -
> >>>>>> something".
> >>>>>>
> >>>>>> I apologize for the confusion.
> >>>>>>
> >>>>>> -- Bert
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Bert Gunter
> >>>>>>
> >>>>>> "The trouble with having an open mind is that people keep coming
> along
> >>>>>> and sticking things into it."
> >>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Apr 19, 2016 at 10:25 AM, Bert Gunter <
> bgunter.4567 at gmail.com>
> >>>>>> wrote:
> >>>>>>> To be precise:
> >>>>>>>
> >>>>>>> paste(round(quantile(x,0.25),0),round(quantile(x,0.75),0),sep="-")
> >>>>>>>
> >>>>>>> is an expression that evaluates to a character string:
> >>>>>>> "round(quantile(x,.25),0) - round(quantile(x,0.75),0)"
> >>>>>>>
> >>>>>>> no matter what the argument of your function, x. Hence
> >>>>>>>
> >>>>>>> return(paste(...)) will return this exact character string and
> never
> >>>>>>> evaluates x.
> >>>>>>>
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Bert
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Bert Gunter
> >>>>>>>
> >>>>>>> "The trouble with having an open mind is that people keep coming
> >>>>>>> along
> >>>>>>> and sticking things into it."
> >>>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Apr 19, 2016 at 8:34 AM, William Dunlap via R-help
> >>>>>>> <r-help at r-project.org> wrote:
> >>>>>>>>> That didn't work Jim!
> >>>>>>>>
> >>>>>>>> It always helps to say how the suggestion did not work.  Jim's
> >>>>>>>> function had a typo in it - was that the problem?  Or did you not
> >>>>>>>> change the call to ddply to use that function.  Here is something
> >>>>>>>> that might "work" for you:
> >>>>>>>>
> >>>>>>>> library(plyr)
> >>>>>>>>
> >>>>>>>> data <- data.frame(groupColumn=rep(1:5,1:5), col1=2^(0:14))
> >>>>>>>> myIqr <- function(x) {
> >>>>>>>>
> >>>>>>>> paste(round(quantile(x,0.25),0),round(quantile(x,0.75),0),sep="-")
> >>>>>>>> }
> >>>>>>>> ddply(data, ~groupColumn, summarise, col1_myIqr=myIqr(col1),
> >>>>>>>> col1_IQR=stats::IQR(col1))
> >>>>>>>> #  groupColumn col1_myIqr col1_IQR
> >>>>>>>> #1           1        1-1        0
> >>>>>>>> #2           2        2-4        1
> >>>>>>>> #3           3      12-24       12
> >>>>>>>> #4           4    112-320      208
> >>>>>>>> #5           5  2048-8192     6144
> >>>>>>>>
> >>>>>>>> The important point is that
> >>>>>>>>
> >>>>>>>> paste(round(quantile(x,0.25),0),round(quantile(x,0.75),0),sep="-")
> >>>>>>>> is not a function, it is an expression.   ddplyr wants functions.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Bill Dunlap
> >>>>>>>> TIBCO Software
> >>>>>>>> wdunlap tibco.com
> >>>>>>>>
> >>>>>>>> On Tue, Apr 19, 2016 at 7:56 AM, Michael Artz
> >>>>>>>> <michaeleartz at gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> That didn't work Jim!
> >>>>>>>>>
> >>>>>>>>> Thanks anyway
> >>>>>>>>>
> >>>>>>>>> On Mon, Apr 18, 2016 at 9:02 PM, Jim Lemon <drjimlemon at gmail.com
> >
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi Michael,
> >>>>>>>>>> At a guess, try this:
> >>>>>>>>>>
> >>>>>>>>>> iqr<-function(x) {
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> return(paste(round(quantile(x,0.25),0),round(quantile(x,0.75),0),sep="-")
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> .col3_Range=iqr(datat$tenure)
> >>>>>>>>>>
> >>>>>>>>>> Jim
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Apr 19, 2016 at 11:15 AM, Michael Artz
> >>>>>>>>>> <michaeleartz at gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>  I am trying to show an interquartile range while grouping
> >>>>>>>>>>> values
> >>>>>>>>> using
> >>>>>>>>>>> the function ddply().  So my function call now is like
> >>>>>>>>>>>
> >>>>>>>>>>> groupedAll <- ddply(data
> >>>>>>>>>>>                 ,~groupColumn
> >>>>>>>>>>>                 ,summarise
> >>>>>>>>>>>                 ,col1_mean=mean(col1)
> >>>>>>>>>>>                 ,col2_mode=Mode(col2) #Function I wrote for
> >>>>>>>>>>> getting
> >>>>>>>>> the
> >>>>>>>>>>> mode shown below
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> ,col3_Range=paste(as.character(round(quantile(datat$tenure,c(.25)))),
> >>>>>>>>>>> as.character(round(quantile(data$tenure,c(.75)))), sep = "-")
> >>>>>>>>>>>                 )
> >>>>>>>>>>>
> >>>>>>>>>>> #custom Mode function
> >>>>>>>>>>> Mode <- function(x) {
> >>>>>>>>>>>  ux <- unique(x)
> >>>>>>>>>>>  ux[which.max(tabulate(match(x, ux)))]
> >>>>>>>>>>> }
> >>>>>>>>>>>
> >>>>>>>>>>> I am not sre what is going wrong on my interquartile range
> >>>>>>>>>>> function, it
> >>>>>>>>>>> works on its own outside of ddply()
> >>>>>>>>>>>
> >>>>>>>>>>>        [[alternative HTML version deleted]]
> >>>>>>>>>>>
> >>>>>>>>>>> ______________________________________________
> >>>>>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
> >>>>>>>>>>> see
> >>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>>>>> PLEASE do read the posting guide
> >>>>>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
> >>>>>>>>>>> code.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>        [[alternative HTML version deleted]]
> >>>>>>>>>
> >>>>>>>>> ______________________________________________
> >>>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
> see
> >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>>> PLEASE do read the posting guide
> >>>>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>>>> and provide commented, minimal, self-contained, reproducible
> code.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>        [[alternative HTML version deleted]]
> >>>>>>>>
> >>>>>>>> ______________________________________________
> >>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>> PLEASE do read the posting guide
> >>>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>>
> >>>
> >>
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list