[Rd] Improved Data Aggregation and Summary Statistics in R

Sebastian Martin Krantz @eb@@t|@n@kr@ntz @end|ng |rom gr@du@te|n@t|tute@ch
Tue Feb 26 14:25:10 CET 2019


Dear Developers,

Having spent time developing and thinking about how data aggregation and
summary statistics can be enhanced in R, I would like to present my
ideas/efforts in the form of two commands:

The first, which for now I called 'collap', is an upgrade of aggregate that
accommodates and extends the functionality of aggregate in various
respects, most importantly to work with multilevel and multi-type data,
multiple function calls, highly customized aggregation tasks, a much
greater flexibility in the passing of inputs and tidy output.

The second function, 'qsu', is an advanced and flexible summary command for
cross-sectional and multilevel (panel) data (i.e. it can provide overall,
between and within entities statistics, and allows for grouping, custom
functions and transformations). It also provides a quick method to compute
and output within-transformed data.

Both commands are efficiently built from core R, but provide for optional
integration with data.table, which renders them extremely fast on large
datasets. An explanation of the syntax, a demonstration and benchmark
results are provided in the attached vignette.

Since both commands accommodate existing functionality while adding
significant basic functionality, I though that their addition to the stats
package would be a worthwhile consideration. I am happy for your feedback.

Best regards,

Sebastian Krantz

-------------- next part --------------
A non-text attachment was scrubbed...
Name: collap & qsu vignette.pdf
Type: application/pdf
Size: 569278 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20190226/eb4dd92d/attachment.pdf>


More information about the R-devel mailing list