[Rd] split.data.frame

Matthew Dowle mdowle at mdowle.plus.com
Thu Dec 17 13:39:27 CET 2009


This seems very similar to the data.table package.

The 'by' argument splits the data.table by that value then executes the j 
expression within each subset.  The package documentation talks about 
'subset' and 'with' in some detail. See ?"[.data.table".

dt = data.table(x=1:20, y=rep(1:4,each=5)
dt[,sum(x),by="y"]

> and x has a variable called grp, what do you get?
In data.table that choice is given to the user via the argument 'with' which 
by default is TRUE meaning you get the x inside dt.


"Romain Francois" <romain.francois at dbmail.com> wrote in message 
news:4B288645.3010602 at dbmail.com...
> On 12/16/2009 12:14 AM, Peter Dalgaard wrote:
>> Romain Francois wrote:
>>> Hello,
>>>
>>> I very much enjoy "with" and "subset" semantics for data frames and
>>> was wondering if we could have something similar with split, basically
>>> by evaluating the second argument "with" the data frame :
>>
>> I seem to recall that this idea was considered and rejected when the
>> current split.data.frame was written (10 years ago!). The main reasons
>> were that
>>
>> - it's not really THAT hard to evaluate a single splitting expression
>> using with() or eval()
>
> Sure, this is just about convenience and laziness.
>
>> - not all applications will have the splitting factor inside the df to
>> split ( split(df[-1], df[[1]]) for a simple case)
>
> this still works
>
>> - if you need a computed splitting factor, there's a risk of inadvertent
>> variable capture. I.e., if you inside a function do
>>
>> ....
>> grp <- ...whatever...
>> spl <- split(x, grp)
>> ....
>>
>> and x has a variable called grp, what do you get?
>
> this is a problem indeed.
>
> thanks for the reply.
>
> Romain
>
> -- 
> Romain Francois
> Professional R Enthusiast
> +33(0) 6 28 91 30 30
> http://romainfrancois.blog.free.fr
> |- http://tr.im/HlX9 : new package : bibtex
> |- http://tr.im/Gq7i : ohloh
> `- http://tr.im/FtUu : new package : highlight
>



More information about the R-devel mailing list