[BioC] IRanges:::coverage() speedup/enchancement

Charles C. Berry cberry at tajo.ucsd.edu
Mon Nov 30 20:10:47 CET 2009



The semantics of the IRanges package and especially the RangedData class 
are very apprpriate for some of the applications I deal with.

Unfortunately, coverage() is too slow to be useful to me.

I wonder if the Biocore Team would consider retooling it to make it
faster? Below I provide a link to a revised coverage.c that might suffice.

The kind of case I need to handle has width values in 10kbase to 10Mbase 
range. As a toy example, being able to run stuff like

       tmp <- coverage( IRanges( start=seq(1,by=1000,length=10000),
 			width=1e7 ) )

quickly is needed.

A revised version of coverage.c is available at

http://cabig2.ucsd.edu:8080/Plone/Members/ccberry/software/coverage.c/view

It will handle the case above almost instantly (while the existing version
needs about 8 minutes on my machine) and seems about equal to the
existing version for cases with width=30.  In the cases I've looked at
gc() reports the same memory usage.

---

Also, I wonder if the Biocore Team would entertain allowing the 'weight'
argument of coverage to be of type double? This would help in cases in 
which downweighting of counts of some genomic features is desired.

Thanks,

Chuck

--
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the Bioconductor mailing list