[R] Sorting and subsetting

Joshua Wiley jwiley.psych at gmail.com
Tue Sep 21 16:27:24 CEST 2010


On Tue, Sep 21, 2010 at 3:09 AM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>
>
> All the solutions in this thread so far use the lapply(split(...)) paradigm
> either directly or indirectly. That paradigm doesn't scale. That's the
> likely
> source of quite a few 'out of memory' errors and performance issues in R.

This is a good point.  It is not nearly as straightforward as the
syntax for data.table (which seems to order and select in one
step...very nice!), but this should be less memory intensive:

tmp <- data.frame(index = gl(2,20), foo = rnorm(40))
tmp <- tmp[order(tmp$index, tmp$foo) , ]

# find location of first instance of each level and add 0:4 to it
x <- sapply(match(levels(tmp$index), tmp$index), `+`, 0:4)

tmp[x, ]

>
> data.table doesn't do that internally, and it's syntax is pretty easy.
>
>> tmp <- data.table(index = gl(2,20), foo = rnorm(40))
>
>> tmp[, .SD[head(order(-foo),5)], by=index]
>      index index.1       foo
>  [1,]     1       1 1.9677303
>  [2,]     1       1 1.2731872
>  [3,]     1       1 1.1100931
>  [4,]     1       1 0.8194719
>  [5,]     1       1 0.6674880
>  [6,]     2       2 1.2236383
>  [7,]     2       2 0.9606766
>  [8,]     2       2 0.8654497
>  [9,]     2       2 0.5404112
> [10,]     2       2 0.3373457
>>
>
> As you can see it currently repeats the group column which is a
> shame (on the to do list to fix).
>
> Matthew
>
> http://datatable.r-forge.r-project.org/
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Sorting-and-subsetting-tp2547360p2548319.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/



More information about the R-help mailing list