[R] Sorting and subsetting

David Winsemius dwinsemius at comcast.net
Mon Sep 20 20:01:40 CEST 2010


On Sep 20, 2010, at 1:40 PM, Joshua Wiley wrote:

> On Mon, Sep 20, 2010 at 10:27 AM, Phil Spector
> <spector at stat.berkeley.edu> wrote:
>> Harold -
>>   Two ways that come to mind:
>>
>> 1) do.call(rbind,lapply(split(tmp,tmp$index),function(x)x[1:5,]))
>> 2) subset(tmp,unlist(tapply(foo,index,seq))<=5)
> 3) do.call(rbind, by(tmp, tmp$index, .Primitive("["), 1:5, 1:2))

I found that rather interesting but somewhat puzzling. I generally  
thought that using "[" should "work" but by() was complaining:
Error in FUN(X[[1L]], ...) : could not find function "FUN"

So tried using back-quotes and got a sensible result.

 > do.call(rbind, by(tmp, tmp$index, FUN=`[`, 1:5, 1:2))
      index        foo
1.6      1 -3.0267759
1.7      1 -1.3725536
1.19     1 -1.1476048
1.16     1 -1.0963967
1.2      1 -1.0684793
2.29     2 -1.6601486
2.21     2 -1.2633632
2.22     2 -0.9875626
2.38     2 -0.9515301
2.30     2 -0.8638903

Unlike Dalgaard who arrived at a similar result via a different route  
and called the row names "silly", I thought they were informative. But  
maybe the sobriquet was directed at his second solution. I couldn't  
tell.

-- 
David.

>
> Josh
>
>>
>>                                        - Phil Spector
>>                                         Statistical Computing  
>> Facility
>>                                         Department of Statistics
>>                                         UC Berkeley
>>                                         spector at stat.berkeley.edu
>>
>>
>>
>> On Mon, 20 Sep 2010, Doran, Harold wrote:
>>
>>> Suppose I have a data frame, such as the one below:
>>>
>>> tmp <- data.frame(index = gl(2,20), foo = rnorm(40))
>>>
>>> And further assume it is sorted by index and then by the variable  
>>> foo.
>>>
>>> tmp <- tmp[order(tmp$index, tmp$foo) , ]
>>>
>>> Now, I want to grab the first N rows of tmp for each index. In the  
>>> end,
>>> what I want is the data frame 'result'
>>>
>>> tmp1 <- subset(tmp, index == 1)
>>> tmp2 <- subset(tmp, index == 2)
>>>
>>> tmp1 <- tmp1[1:5,]
>>> tmp2 <- tmp2[1:5,]
>>> result <- rbind(tmp1, tmp2)
>>>
>>> Does anyone see a way to subset and subsequently bind without a  
>>> loop?
>>>
>>> Harold
>>>
>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> -- 
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list