[R] row selection based on median in data frame

Thu Apr 1 17:33:50 CEST 2004

Ed L Cashin <ecashin at uga.edu> writes:

...
> Is there a way to tell aggregate just do perform median on column runtime to
> select the whole row?  

Some helpful folks have emailed me requesting more info about what I'm
trying to do.  Here's a simple R function to produce a data frame like
the one I am working on.

demo.frame <- function() {
  n.runs <- 3
  types <- c("red","black","blue")
  foo <- 1:5
  bar <- seq(50,90,by=10)
  d <- data.frame()

  for (i in 1:n.runs) {
    for (t in types) {
      for (f in foo) {
        for (b in bar) {
          row <- data.frame(type=t,
                            foo=f,
                            bar=b,
                            a=rnorm(1),
                            b=rnorm(1),
                            c=rnorm(1))
          d <- rbind(d,row)
        }
      }
    }
  }
  d
}

Every so often, in the resulting rows, you get a row where the type,
the foo, and the bar values are all the same.  I need to look at the
rows with such a matching set of values as a group, selecting the one
row with the median "c" value, and preserving all of that row's other
values.  So median should not be done on the "a" or "b" columns, just
the "c" column.

There are two ways I see to approach this problem.  One would be:

  for each subset of rows with matching type, foo, and bar values, 
    find the row with the median c value and output it

The other, which I've been able to do, takes advantage of knowledge
about the sequence of rows in the data frame:

median.runs <- function(d, n.runs=0) {
  if (missing(n.runs))
    stop("missing n.runs parameter is required")

  len <- length(d$type) / n.runs
  i <- c()

  # build an index that will select similar rows
  for (n in 0:(n.runs - 1)) {
    i[n + 1] <- n * len + 1
  }
  a <- list()
  for (j in 1:len) {
    cat("i:",i,"\n")
    rows <- d[i,]
    md <- median(rows$c)
    cat("md:",md,"\n")
    matches <- rows[rows$c == md,]
    a <- rbind(a, matches[1,])
    i <- i + 1
  }
  a
}

-- 
--Ed L Cashin            |   PGP public key:
  ecashin at uga.edu        |   http://noserose.net/e/pgp/