[R] row selection based on median in data frame

Thu Apr 1 06:25:23 CEST 2004

Ed L Cashin <ecashin at uga.edu> writes:

> Ed L Cashin <ecashin at uga.edu> writes:
>
>> Hi.  I am having trouble thinking of an easy way to grab rows out of a
>> data frame.  I want to select the rows with a median value when the
>> rows are similar.
>
> I'm still catching up on my R list reading, and I notice there is a
> similar post to mine:
>
>    Federico Calboli    
>    data manipulation: getting mean value every 5 rows
>
> I think the responses there answer my question, but I'll have to look
> into it.  The responses say to use aggregate and an auxiliary row.

After consulting the docs and Venables and Ripley, I am not sure
aggregate can do what I'm looking for.  Given rows where certain
specified columns have the same values, I'd like to select the row
with the median value in another specified column ("runtime").

That is, after grouping the rows of the data frame based on the
columns in the by parameter, I want to select one whole row "as is",
the row with the median "runtime" value, without doing median on more
than one column.

     'aggregate.data.frame' is the data frame method.  If 'x' is not a
     data frame, it is coerced to one.  Then, each of the variables
     (columns) in 'x' is split into subsets of cases (rows) of
     identical combinations of the components of 'by', and 'FUN' is
     applied to each such subset with further arguments in '...' passed
     to it. (I.e., 'tapply(VAR, by, FUN, ..., simplify = FALSE)' is
     done for each variable 'VAR' in 'x', conveniently wrapped into one
     call to 'lapply()'.) 

Is there a way to tell aggregate just do perform median on column runtime to
select the whole row?  

     Empty subsets are removed, and the result is
     reformatted into a data frame containing the variables in 'by' and
     'x'.  The ones arising from 'by' contain the unique combinations
     of grouping values used for determining the subsets, and the ones
     arising from 'x' the corresponding summary statistics for the
     subset of the respective variables in 'x'.

I'd like to select all the columns, not just the ones I'm using to
group or the one from which I want to find the median value.  I think
I can't use aggregate after all.  But I must admit I'm very tired and
should go to bed.

-- 
--Ed L Cashin            |   PGP public key:
  ecashin at uga.edu        |   http://noserose.net/e/pgp/