[R] function to filter identical data.fames using less than (<) and greater than (>)

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Thu Dec 6 18:00:30 CET 2012


You ask me to provide code when you have only described your solution rather than your problem. That limits my options more than I care to allow for investing my time.

When I think of problems that require repetitive subsetting I tend to look for solutions involving aggregation (?aggregate, ?plyr::ddply), which requires creating one or more grouping columns which can be formulated with the cut function or with logical indexed assignment (e.g. a sequence of statements something like eg$grpcol[with(eg,grpcol!="Default" & A<1 & B<1)] <- "ABTooLow").

So... what is your problem?
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Karl Brand <k.brand at erasmusmc.nl> wrote:

>Hi Jeff,
>
>Subset is indeed what's reuqired here. But using it every time it's 
>needed was generating excessive amounts of obtuse code. So for the sake
>
>of clarity and convenience i wanted a wrapper function to replace these
>
>repetitious subsets.
>
>Although Rui's example works just fine, love to see any idiomatic ways 
>you might attempt this (also for the sake of improving my grasp of R).
>
>Cheers,
>
>Karl
>
>
>
>
>On 06/12/12 15:57, Jeff Newmiller wrote:
>> You have not indicated why the subset function is insufficient for
>your needs...
>>
>---------------------------------------------------------------------------
>> Jeff Newmiller                        The     .....       .....  Go
>Live...
>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>Go...
>>                                        Live:   OO#.. Dead: OO#.. 
>Playing
>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>> /Software/Embedded Controllers)               .OO#.       .OO#. 
>rocks...1k
>>
>---------------------------------------------------------------------------
>> Sent from my phone. Please excuse my brevity.
>>
>> Karl Brand <k.brand at erasmusmc.nl> wrote:
>>
>>> Esteemed UseRs,
>>>
>>> I've got many biggish data frames which need a lot subsetting, like
>in
>>> this example:
>>>
>>> # example
>>> eg <- data.frame(A = rnorm(10), B = rnorm(10), C = rnorm(10), D =
>>> rnorm(10))
>>> egsub <- eg[eg$A < 0 & eg$B < 1 & eg$C > 0, ]
>>> egsub
>>> egsub2 <- eg[eg$A > 1 & eg$B > 0, ]
>>> egsub2
>>>
>>> # To make this clearer than 1000s of lines of extractions with []
>>> # I tried to make a function like this:
>>>
>>> # func(data="eg", A="< 0", B="< 1", C="> 0")
>>>
>>> # Which would also need to be run as
>>>
>>> # func(data="eg", A="> 1", B="> 0", C=NA)
>>> #end
>>>
>>> Noteably:
>>> -the signs* "<" and ">" need to be flexible _and_ optional
>>> -the quantities also need to be flexible
>>> -column header names i.e, A, B and C don't need flexibility,
>>> i.e., can remain fixed
>>> * "less than" and "greater than" so google picks up this thread
>>>
>>> Once again i find just how limited my grasp of R is...Is do.call()
>the
>>> best way to call binary operators like < & > in a function? Is an
>>> ifelse
>>> statement needed for each column to make filtering on it optional?
>>> etc....
>>>
>>> Any one with the patience to show their working version of such a
>>> funciton would receive my undying Rdulation. With thanks in advance,
>>>
>>> Karl
>>




More information about the R-help mailing list