[R] Sorting and subsetting

Joshua Wiley jwiley.psych at gmail.com
Mon Sep 20 20:29:46 CEST 2010


On Mon, Sep 20, 2010 at 11:15 AM, David Winsemius
<dwinsemius at comcast.net> wrote:
>
> On Sep 20, 2010, at 2:01 PM, David Winsemius wrote:
>
>>
>> On Sep 20, 2010, at 1:40 PM, Joshua Wiley wrote:
>>
>>> On Mon, Sep 20, 2010 at 10:27 AM, Phil Spector
>>> <spector at stat.berkeley.edu> wrote:
>>>>
>>>> Harold -
>>>>  Two ways that come to mind:
>>>>
>>>> 1) do.call(rbind,lapply(split(tmp,tmp$index),function(x)x[1:5,]))
>>>> 2) subset(tmp,unlist(tapply(foo,index,seq))<=5)
>>>
>>> 3) do.call(rbind, by(tmp, tmp$index, .Primitive("["), 1:5, 1:2))
>>
>> I found that rather interesting but somewhat puzzling. I generally thought
>> that using "[" should "work" but by() was complaining:
>> Error in FUN(X[[1L]], ...) : could not find function "FUN"
>>
>> So tried using back-quotes and got a sensible result.

I wondered about this too.  I had tried single and double quotes
before giving up...back quotes never occurred to me.  I also finally
figured out how to use it to select all columns, which leaves its
shortest form as:

do.call(rbind, by(tmp, tmp$index, `[`, 1:5, ))

>
> The need for back-quoting disappears if we add a match.fun call to
> by.data.frame():
>
> by.data.frame <-
> function (data, INDICES, FUN, ..., simplify = TRUE)
> { FUN <- match.fun(FUN)
>    if (!is.list(INDICES)) {
>        IND <- vector("list", 1L)
>        IND[[1L]] <- INDICES
>        names(IND) <- deparse(substitute(INDICES))[1L]
>    }
>    else IND <- INDICES
>    FUNx <- function(x) FUN(data[x, , drop = FALSE], ...)
>    nd <- nrow(data)
>    ans <- eval(substitute(tapply(1L:nd, IND, FUNx, simplify = simplify)),
>        data)
>    attr(ans, "call") <- match.call()
>    class(ans) <- "by"
>    ans
> }
>
> I would have thought such a call would be in the by.data.frame and
> by.default code but they seem to be "missing in action". Would there be any
> downside to modifying those functions in that manner?
>
> --
> David.
>
>
>>
>> > do.call(rbind, by(tmp, tmp$index, FUN=`[`, 1:5, 1:2))
>>    index        foo
>> 1.6      1 -3.0267759
>> 1.7      1 -1.3725536
>> 1.19     1 -1.1476048
>> 1.16     1 -1.0963967
>> 1.2      1 -1.0684793
>> 2.29     2 -1.6601486
>> 2.21     2 -1.2633632
>> 2.22     2 -0.9875626
>> 2.38     2 -0.9515301
>> 2.30     2 -0.8638903
>>
>> Unlike Dalgaard who arrived at a similar result via a different route and
>> called the row names "silly", I thought they were informative. But maybe the
>> sobriquet was directed at his second solution. I couldn't tell.
>>
>> --
>> David.
>>
>>>
>>> Josh
>>>
>>>>
>>>>                                      - Phil Spector
>>>>                                       Statistical Computing Facility
>>>>                                       Department of Statistics
>>>>                                       UC Berkeley
>>>>                                       spector at stat.berkeley.edu
>>>>
>>>>
>>>>
>>>> On Mon, 20 Sep 2010, Doran, Harold wrote:
>>>>
>>>>> Suppose I have a data frame, such as the one below:
>>>>>
>>>>> tmp <- data.frame(index = gl(2,20), foo = rnorm(40))
>>>>>
>>>>> And further assume it is sorted by index and then by the variable foo.
>>>>>
>>>>> tmp <- tmp[order(tmp$index, tmp$foo) , ]
>>>>>
>>>>> Now, I want to grab the first N rows of tmp for each index. In the end,
>>>>> what I want is the data frame 'result'
>>>>>
>>>>> tmp1 <- subset(tmp, index == 1)
>>>>> tmp2 <- subset(tmp, index == 2)
>>>>>
>>>>> tmp1 <- tmp1[1:5,]
>>>>> tmp2 <- tmp2[1:5,]
>>>>> result <- rbind(tmp1, tmp2)
>>>>>
>>>>> Does anyone see a way to subset and subsequently bind without a loop?
>>>>>
>>>>> Harold
>>>>>
>>>>>
>>>>>
>>>>>      [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Joshua Wiley
>>> Ph.D. Student, Health Psychology
>>> University of California, Los Angeles
>>> http://www.joshuawiley.com/
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>



More information about the R-help mailing list