[Rd] Improved Data Aggregation and Summary Statistics in R

Sebastian Martin Krantz @eb@@t|@n@kr@ntz @end|ng |rom gr@du@te|n@t|tute@ch
Thu Feb 28 10:06:20 CET 2019


Thanks to all who gave feedback so far, there is now a version of the
package on Github, it can be installed by

remotes::install_github("SebKrantz/collapse")

further feedback is still very welcome!


On Wed, 27 Feb 2019 at 12:48, Duncan Murdoch <murdoch.duncan using gmail.com>
wrote:

> On 26/02/2019 8:25 a.m., Sebastian Martin Krantz wrote:
> > Dear Developers,
> >
> > Having spent time developing and thinking about how data aggregation and
> > summary statistics can be enhanced in R, I would like to present my
> > ideas/efforts in the form of two commands:
> >
> > The first, which for now I called 'collap', is an upgrade of aggregate
> that
> > accommodates and extends the functionality of aggregate in various
> > respects, most importantly to work with multilevel and multi-type data,
> > multiple function calls, highly customized aggregation tasks, a much
> > greater flexibility in the passing of inputs and tidy output.
> >
> > The second function, 'qsu', is an advanced and flexible summary command
> for
> > cross-sectional and multilevel (panel) data (i.e. it can provide overall,
> > between and within entities statistics, and allows for grouping, custom
> > functions and transformations). It also provides a quick method to
> compute
> > and output within-transformed data.
> >
> > Both commands are efficiently built from core R, but provide for optional
> > integration with data.table, which renders them extremely fast on large
> > datasets. An explanation of the syntax, a demonstration and benchmark
> > results are provided in the attached vignette.
> >
> > Since both commands accommodate existing functionality while adding
> > significant basic functionality, I though that their addition to the
> stats
> > package would be a worthwhile consideration. I am happy for your
> feedback.
>
> Generally the R Core group is reluctant to incorporate new functions
> into the base packages.  Each function that is added adds to their work,
> and they already have too much to do.  (I am no longer a member of R
> Core, but I don't think things have changed since I retired.)
>
> It is much easier for them if volunteers publish functions themselves,
> via contributed packages.
>
> Nowadays Github provides a very convenient platform on which you can
> develop a package containing your functions.  If other users find bugs
> or have suggested improvements, it's very easy for them to send those to
> you, and you can make the fixes available immediately.  Once you are
> satisfied that it is stable, you can submit it to CRAN, and anyone using
> R can easily install it.
>
> If you find the prospect of writing a package daunting, you shouldn't.
> It's actually quite easy, especially if you are using RStudio or ESS (or
> some other helpful front-end.)  Hadley Wickham's book
> <http://r-pkgs.had.co.nz/> is a pretty accessible description of a
> development strategy.  (It's not the only strategy, but lots of people
> use it.)
>
> Duncan Murdoch
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list