[R] User-defined functions in dplyr

William Dunlap wdunlap at tibco.com
Fri Oct 30 17:19:37 CET 2015


dplyr::mutate is probably what you want instead of dplyr::summarize:

create_bins3 <- function (xpred, nBins)
{
    Breaks <- unique(quantile(xpred, probs = seq(0, 1, 1/nBins)))
    bin <- cut(xpred, breaks = Breaks, include.lowest = TRUE)
    bin
}
dplyr::group_by(df, models) %>% dplyr::mutate(Bin=create_bins3(pred,nBins))
#Source: local data frame [100 x 3]
#Groups: models [2]
#
#         pred models               Bin
#        (dbl) (fctr)            (fctr)
#1   0.2167549 model1     (0.167,0.577]
#2  -0.5424926 model1   (-0.869,-0.481]
...


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Oct 30, 2015 at 9:06 AM, William Dunlap <wdunlap at tibco.com> wrote:

> The error message is not very helpful and the stack trace is pretty
> inscrutable as well
> > dplyr::group_by(df, models) %>% dplyr::summarize(create_bins)
> Error: not a vector
> > traceback()
> 14: stop(list(message = "not a vector", call = NULL, cppstack = NULL))
> 13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots)
> 12: summarise_impl(.data, dots)
> 11: summarise_.tbl_df(.data, .dots = lazyeval::lazy_dots(...))
> 10: summarise_(.data, .dots = lazyeval::lazy_dots(...))
> 9: dplyr::summarize(., create_bins)
> 8: function_list[[k]](value)
> 7: withVisible(function_list[[k]](value))
> 6: freduce(value, `_function_list`)
> 5: `_fseq`(`_lhs`)
> 4: eval(expr, envir, enclos)
> 3: eval(quote(`_fseq`(`_lhs`)), env, env)
> 2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
> 1: dplyr::group_by(df, models) %>% dplyr::summarize(create_bins)
>
>
> It does not mean that your function, create_bins, does not return a vector
> --
> the sum function gives the same result. help(summarize,package="dplyr")
> says:
>      ...: Name-value pairs of summary functions like ‘min()’, ‘mean()’,
>           ‘max()’ etc.
> It apparently means calls to summary functions, not summary functions
> themselves.  The examples in the help file show the proper usage.
>
> Use a call to your function and you will see it works better
>    > dplyr::group_by(df, models) %>%
> dplyr::summarize(create_bins(pred,nBins))
>    Error: $ operator is invalid for atomic vectors
> The traceback again is not very useful, because the call information was
> stripped by dplyr (by the call=NULL in the call to stop()):
>   > traceback()
>   14: stop(list(message = "$ operator is invalid for atomic vectors",
>           call = NULL, cppstack = NULL))
>   13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots)
> However it is clear that the fault is in your function, which is expecting
> a
> data.frame x with a column called pred but gets pred itself.  Change x to
> xpred
> in the argument list and x$pred to xpred in the body of the function.
>
> You will run into more problems because your function returns a vector
> the length of its input but summarize expects a summary function - one
> that returns a scalar for any size vector input.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Fri, Oct 30, 2015 at 4:04 AM, Axel Urbiz <axel.urbiz at gmail.com> wrote:
>
>> So in this case, "create_bins" returns a vector and I still get the same
>> error.
>>
>>
>> create_bins <- function(x, nBins)
>> {
>>   Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))
>>   bin <- cut(x$pred, breaks = Breaks, include.lowest = TRUE)
>>   bin
>> }
>>
>>
>> ### Using dplyr (fails)
>> nBins = 10
>> by_group <- dplyr::group_by(df, models)
>> res_dplyr <- dplyr::summarize(by_group, create_bins, nBins)
>> Error: not a vector
>>
>> On Thu, Oct 29, 2015 at 8:28 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us
>> >
>> wrote:
>>
>> > You are jumping the gun (your other email did get through) and you are
>> > posting using HTML (which does not come through on the list). Some time
>> > (re)reading the Posting Guide mentioned at the bottom of all emails on
>> this
>> > list seems to be in order.
>> >
>> > The error is actually quite clear. You should return a vector from your
>> > function, not a data frame.
>> >
>> ---------------------------------------------------------------------------
>> > Jeff Newmiller                        The     .....       .....  Go
>> Live...
>> > DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>> > Go...
>> >                                       Live:   OO#.. Dead: OO#..  Playing
>> > Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>> > /Software/Embedded Controllers)               .OO#.       .OO#.
>> rocks...1k
>> >
>> ---------------------------------------------------------------------------
>> > Sent from my phone. Please excuse my brevity.
>> >
>> > On October 29, 2015 4:55:19 PM MST, Axel Urbiz <axel.urbiz at gmail.com>
>> > wrote:
>> > >Hello,
>> > >
>> > >Sorry, resending this question as the prior was not sent properly.
>> > >
>> > >I’m using the plyr package below to add a variable named "bin" to my
>> > >original data frame "df" with the user-defined function "create_bins".
>> > >I'd
>> > >like to get similar results using dplyr instead, but failing to do so.
>> > >
>> > >set.seed(4)
>> > >df <- data.frame(pred = rnorm(100), models = gl(2, 50, 100, labels =
>> > >c("model1", "model2")))
>> > >
>> > >
>> > >### Using plyr (works fine)
>> > >create_bins <- function(x, nBins)
>> > >{
>> > >  Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))
>> > >  dfB <-  data.frame(pred = x$pred,
>> > >                    bin = cut(x$pred, breaks = Breaks, include.lowest =
>> > >TRUE))
>> > >  dfB
>> > >}
>> > >
>> > >nBins = 10
>> > >res_plyr <- plyr::ddply(df, plyr::.(models), create_bins, nBins)
>> > >head(res_plyr)
>> > >
>> > >### Using dplyr (fails)
>> > >
>> > >by_group <- dplyr::group_by(df, models)
>> > >res_dplyr <- dplyr::summarize(by_group, create_bins, nBins)
>> > >Error: not a vector
>> > >
>> > >
>> > >Any help would be much appreciated.
>> > >
>> > >Best,
>> > >Axel.
>> > >
>> > >       [[alternative HTML version deleted]]
>> > >
>> > >______________________________________________
>> > >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > >https://stat.ethz.ch/mailman/listinfo/r-help
>> > >PLEASE do read the posting guide
>> > >http://www.R-project.org/posting-guide.html
>> > >and provide commented, minimal, self-contained, reproducible code.
>> >
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list