# [R] row selection based on median in data frame

Ed L Cashin ecashin at uga.edu
Thu Apr 1 17:33:50 CEST 2004

```Ed L Cashin <ecashin at uga.edu> writes:

...
> Is there a way to tell aggregate just do perform median on column runtime to
> select the whole row?

trying to do.  Here's a simple R function to produce a data frame like
the one I am working on.

demo.frame <- function() {
n.runs <- 3
types <- c("red","black","blue")
foo <- 1:5
bar <- seq(50,90,by=10)
d <- data.frame()

for (i in 1:n.runs) {
for (t in types) {
for (f in foo) {
for (b in bar) {
row <- data.frame(type=t,
foo=f,
bar=b,
a=rnorm(1),
b=rnorm(1),
c=rnorm(1))
d <- rbind(d,row)
}
}
}
}
d
}

Every so often, in the resulting rows, you get a row where the type,
the foo, and the bar values are all the same.  I need to look at the
rows with such a matching set of values as a group, selecting the one
row with the median "c" value, and preserving all of that row's other
values.  So median should not be done on the "a" or "b" columns, just
the "c" column.

There are two ways I see to approach this problem.  One would be:

for each subset of rows with matching type, foo, and bar values,
find the row with the median c value and output it

The other, which I've been able to do, takes advantage of knowledge
about the sequence of rows in the data frame:

median.runs <- function(d, n.runs=0) {
if (missing(n.runs))
stop("missing n.runs parameter is required")

len <- length(d\$type) / n.runs
i <- c()

# build an index that will select similar rows
for (n in 0:(n.runs - 1)) {
i[n + 1] <- n * len + 1
}
a <- list()
for (j in 1:len) {
cat("i:",i,"\n")
rows <- d[i,]
md <- median(rows\$c)
cat("md:",md,"\n")
matches <- rows[rows\$c == md,]
a <- rbind(a, matches[1,])
i <- i + 1
}
a
}

--
--Ed L Cashin            |   PGP public key:
ecashin at uga.edu        |   http://noserose.net/e/pgp/

```