[R] vectorization with subset?
dvera at bio.fsu.edu
Mon Jul 2 18:15:32 CEST 2012
I have a data frame (68,000 rows) of scores (V4) for a series of [genomic]
coordinates ranges (V2 to V3).
I also have a data frame (1.2 million rows) of single [genomic] coordinates.
For each genomic coordinate (in coord), I would like to determine the
average of all scores whose genomic ranges (in scores) encompass the
coordinate (in coord). To accomplish this, I tried:
The function works, but is extremely slow.
It would take about 4 days for this to finish for a single data set, and I
have 64 data sets.
Why does the rate at which coordinate averages are calculated increase when
coord is smaller, but not when scores is smaller?
How can I accomplish the same thing more efficiently?
View this message in context: http://r.789695.n4.nabble.com/vectorization-with-subset-tp4635156.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help