[R] User-defined functions in dplyr

William Dunlap wdunlap at tibco.com
Fri Oct 30 17:06:16 CET 2015


The error message is not very helpful and the stack trace is pretty
inscrutable as well
> dplyr::group_by(df, models) %>% dplyr::summarize(create_bins)
Error: not a vector
> traceback()
14: stop(list(message = "not a vector", call = NULL, cppstack = NULL))
13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots)
12: summarise_impl(.data, dots)
11: summarise_.tbl_df(.data, .dots = lazyeval::lazy_dots(...))
10: summarise_(.data, .dots = lazyeval::lazy_dots(...))
9: dplyr::summarize(., create_bins)
8: function_list[[k]](value)
7: withVisible(function_list[[k]](value))
6: freduce(value, `_function_list`)
5: `_fseq`(`_lhs`)
4: eval(expr, envir, enclos)
3: eval(quote(`_fseq`(`_lhs`)), env, env)
2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
1: dplyr::group_by(df, models) %>% dplyr::summarize(create_bins)


It does not mean that your function, create_bins, does not return a vector
--
the sum function gives the same result. help(summarize,package="dplyr")
says:
     ...: Name-value pairs of summary functions like ‘min()’, ‘mean()’,
          ‘max()’ etc.
It apparently means calls to summary functions, not summary functions
themselves.  The examples in the help file show the proper usage.

Use a call to your function and you will see it works better
   > dplyr::group_by(df, models) %>%
dplyr::summarize(create_bins(pred,nBins))
   Error: $ operator is invalid for atomic vectors
The traceback again is not very useful, because the call information was
stripped by dplyr (by the call=NULL in the call to stop()):
  > traceback()
  14: stop(list(message = "$ operator is invalid for atomic vectors",
          call = NULL, cppstack = NULL))
  13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots)
However it is clear that the fault is in your function, which is expecting a
data.frame x with a column called pred but gets pred itself.  Change x to
xpred
in the argument list and x$pred to xpred in the body of the function.

You will run into more problems because your function returns a vector
the length of its input but summarize expects a summary function - one
that returns a scalar for any size vector input.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Oct 30, 2015 at 4:04 AM, Axel Urbiz <axel.urbiz at gmail.com> wrote:

> So in this case, "create_bins" returns a vector and I still get the same
> error.
>
>
> create_bins <- function(x, nBins)
> {
>   Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))
>   bin <- cut(x$pred, breaks = Breaks, include.lowest = TRUE)
>   bin
> }
>
>
> ### Using dplyr (fails)
> nBins = 10
> by_group <- dplyr::group_by(df, models)
> res_dplyr <- dplyr::summarize(by_group, create_bins, nBins)
> Error: not a vector
>
> On Thu, Oct 29, 2015 at 8:28 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
> wrote:
>
> > You are jumping the gun (your other email did get through) and you are
> > posting using HTML (which does not come through on the list). Some time
> > (re)reading the Posting Guide mentioned at the bottom of all emails on
> this
> > list seems to be in order.
> >
> > The error is actually quite clear. You should return a vector from your
> > function, not a data frame.
> >
> ---------------------------------------------------------------------------
> > Jeff Newmiller                        The     .....       .....  Go
> Live...
> > DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
> > Go...
> >                                       Live:   OO#.. Dead: OO#..  Playing
> > Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> > /Software/Embedded Controllers)               .OO#.       .OO#.
> rocks...1k
> >
> ---------------------------------------------------------------------------
> > Sent from my phone. Please excuse my brevity.
> >
> > On October 29, 2015 4:55:19 PM MST, Axel Urbiz <axel.urbiz at gmail.com>
> > wrote:
> > >Hello,
> > >
> > >Sorry, resending this question as the prior was not sent properly.
> > >
> > >I’m using the plyr package below to add a variable named "bin" to my
> > >original data frame "df" with the user-defined function "create_bins".
> > >I'd
> > >like to get similar results using dplyr instead, but failing to do so.
> > >
> > >set.seed(4)
> > >df <- data.frame(pred = rnorm(100), models = gl(2, 50, 100, labels =
> > >c("model1", "model2")))
> > >
> > >
> > >### Using plyr (works fine)
> > >create_bins <- function(x, nBins)
> > >{
> > >  Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))
> > >  dfB <-  data.frame(pred = x$pred,
> > >                    bin = cut(x$pred, breaks = Breaks, include.lowest =
> > >TRUE))
> > >  dfB
> > >}
> > >
> > >nBins = 10
> > >res_plyr <- plyr::ddply(df, plyr::.(models), create_bins, nBins)
> > >head(res_plyr)
> > >
> > >### Using dplyr (fails)
> > >
> > >by_group <- dplyr::group_by(df, models)
> > >res_dplyr <- dplyr::summarize(by_group, create_bins, nBins)
> > >Error: not a vector
> > >
> > >
> > >Any help would be much appreciated.
> > >
> > >Best,
> > >Axel.
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > >______________________________________________
> > >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >https://stat.ethz.ch/mailman/listinfo/r-help
> > >PLEASE do read the posting guide
> > >http://www.R-project.org/posting-guide.html
> > >and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]



More information about the R-help mailing list