[R] vectorization with subset?

dlv04c dvera at bio.fsu.edu
Mon Jul 2 18:15:32 CEST 2012


Hello,

I have a data frame (68,000 rows) of scores (V4) for a series of [genomic]
coordinates ranges (V2 to V3).



I also have a data frame (1.2 million rows) of single [genomic] coordinates.  



For each genomic coordinate (in coord), I would like to determine the
average of all scores whose genomic ranges (in scores) encompass the
coordinate (in coord). To accomplish this, I tried:



The function works, but is extremely slow.

It would take about 4 days for this to finish for a single data set, and I
have 64 data sets.

Why does the rate at which coordinate averages are calculated increase when
coord is smaller, but not when scores is smaller?

How can I accomplish the same thing more efficiently?

Thanks,

Dan

--
View this message in context: http://r.789695.n4.nabble.com/vectorization-with-subset-tp4635156.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list