[R] R package recommendation - recursively partition data frame, calculate summaries of node data frames, plot and print summaries

Bert Gunter bgunter.4567 at gmail.com
Tue May 30 17:08:38 CEST 2017


1. Generally this sort of thing (statistical issues) is OT here.

2. Have you tried googling? "recursive partitioning R" .

3. Have you looked at the CRAN "Machine Laearning" Task View?
https://cran.r-project.org/web/views/MachineLearning.html

-- Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, May 30, 2017 at 4:51 AM, Ross Gayler <r.gayler at gmail.com> wrote:
> I am after R package recommendations.
>
> I have a data frame with ~5 million rows and ~50 columns. (I could do what
> I want with a sample of the rows, but ideally i would use all the rows.)
>
> (1) I want to recursively partition the rows of the data frame in a way
> that I manually specify. That is, I want to generate a tree structure such
> that each node of the tree represents a subset of the rows of the data
> frame and the child nodes of any parent node represent a partition of the
> rows represented by the parent node. This is the sort of thing that tree
> induction algorithms like CART and ID3 do, but I want to manually specify
> the tree structure rather than have some algorithm decide it for me.
>
> (2) I want the means for specifying the tree structure to be as simple as
> possible, because the users will be trying out different tree structures.
>
> (3) Each node (internal or terminal) of the tree represents a row subset of
> the root data frame. I want to be able to specify a function to be applied
> to each node that takes the node data frame as input and calculates a set
> of summary statistics. I will probably write this node summary function as
> a dplyr pipeline. I will want to be able to associate the summaries with
> the nodes so that I keep track of the summaries in terms of the tree
> structure.
>
> (4) I want to be able to print and plot the tree of summaries in a way that
> shows the summaries in the context of the tree structure. Inevitably, there
> will be fiddling with the formatting of the prints and plots, so I expect i
> will need user definable print/plot formatting functions that are applied
> to each node of the tree.
>
> What I am looking for is an R package that provides the best starting point
> for me to implement this. I am not a particularly good programmer, so
> getting a package that minimises what I have to write is important to me.
>
> So far, the most likely packages appear to be:
>
>    - partykit <http://partykit.r-forge.r-project.org/partykit/>
>    - data.tree <https://github.com/gluc/data.tree>
>
> I would appreciate any recommendations for R packages that would serve as a
> good base; any comments on the relative merits of the packages for my
> purposes; and any pointers to example code of people doing similar things.
>
> Thanks
>
> Ross
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list