[R] Sorting and subsetting

Matthew Dowle mdowle at mdowle.plus.com
Tue Sep 21 17:28:00 CEST 2010


Probably true, thats cunning, but look at base::match. The
first thing it does is coerce factor to character (an allocate
and copy needed internally). data.table doesn't do that
either, see data.table:::sortedmatch.

I made first basic steps towards a proper reproducible test
suite (timings.Rnw). Perhaps this example could be
added there; PDF is on the homepage. One test is 340
times faster and the other is 13 times faster. More
examples would be good.

Matthew
http://datatable.r-forge.r-project.org/


"Joshua Wiley" <jwiley.psych at gmail.com> wrote in message 
news:AANLkTimyUvL9sUJ65KtZQvpNYn+eP8ubu3MXXHHrDM1k at mail.gmail.com...
> On Tue, Sep 21, 2010 at 3:09 AM, Matthew Dowle <mdowle at mdowle.plus.com> 
> wrote:
>>
>>
>> All the solutions in this thread so far use the lapply(split(...)) 
>> paradigm
>> either directly or indirectly. That paradigm doesn't scale. That's the
>> likely
>> source of quite a few 'out of memory' errors and performance issues in R.
>
> This is a good point.  It is not nearly as straightforward as the
> syntax for data.table (which seems to order and select in one
> step...very nice!), but this should be less memory intensive:
>
> tmp <- data.frame(index = gl(2,20), foo = rnorm(40))
> tmp <- tmp[order(tmp$index, tmp$foo) , ]
>
> # find location of first instance of each level and add 0:4 to it
> x <- sapply(match(levels(tmp$index), tmp$index), `+`, 0:4)
>
> tmp[x, ]
>
>>
>> data.table doesn't do that internally, and it's syntax is pretty easy.
>>
>>> tmp <- data.table(index = gl(2,20), foo = rnorm(40))
>>
>>> tmp[, .SD[head(order(-foo),5)], by=index]
>> index index.1 foo
>> [1,] 1 1 1.9677303
>> [2,] 1 1 1.2731872
>> [3,] 1 1 1.1100931
>> [4,] 1 1 0.8194719
>> [5,] 1 1 0.6674880
>> [6,] 2 2 1.2236383
>> [7,] 2 2 0.9606766
>> [8,] 2 2 0.8654497
>> [9,] 2 2 0.5404112
>> [10,] 2 2 0.3373457
>>>
>>
>> As you can see it currently repeats the group column which is a
>> shame (on the to do list to fix).
>>
>> Matthew
>>
>> http://datatable.r-forge.r-project.org/
>>
>>
>> --
>> View this message in context: 
>> http://r.789695.n4.nabble.com/Sorting-and-subsetting-tp2547360p2548319.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> -- 
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list