[R] inclusion criteria help

Aaron J Mackey ajm6q at virginia.edu
Tue Nov 27 16:10:16 CET 2001


I have a dataset that looks like this (many other variables not
shown. including a unique row identifier "id"):

> summary(hits)
    query               lib               coverage         percid
 Length:80664       Length:80664       Min.   :0.080   Min.   :0.2250
 Mode  :character   Mode  :character   1st Qu.:0.980   1st Qu.:0.8160
                                       Median :1.000   Median :0.9230
                                       Mean   :0.946   Mean   :0.8536
                                       3rd Qu.:1.000   3rd Qu.:0.9900
                                       Max.   :1.000   Max.   :1.0000

For any query/lib combination there may be 1 or more rows of data. I'd
like to be able to specify only the rows for each query/lib combination
that have the maximum (or minimum or whatever) coverage or percid or some
other data element, and carry along the other corresponding data elements
from that same row.

I know I can do this procedurally in a loop:

query <- c('')
lib <- c('')
coverage <- c(0)
percid <- c(0)

for(q in unique(hits$query)) {
  for(l in unique(hits$lib[hits$query == q])) {
    query <- c(query, q)
    lib <- c(lib, l)
    max.coverage <- 0
    for(id in hits$id[hits$query == q & hits$lib == l]) {
      if(hits$coverage[hits$id == id] > max.coverage) {
        max.coverage.id <- id
        max.coverage <- hits$coverage[hits$id == id]
      }
    }
    coverage <- c(coverage, hits$coverage[hits$id == max.coverage.id])
    percid <- c(percid, hits$percid[hits$id == max.coverage.id])
  }
}

filtered.hits <- data.frame(query=query[2:length(query)],
                            lib=lib[2:length(lib)],
                            coverage=coverage[2:length(coverage)],
                            percid=percid[2:length(percid)]
                           )

# finally get to do something with it now:
plot(filtered.hits$coverage[filtered.hits$query == 'ABC'],
     filtered.hits$percid[filtered.hits$query == 'ABC']
    )


So, how could I accomplish the same plot as above without the looping and
creating a new dataframe?

Thanks,

-Aaron

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list