[BioC] [somehow-OT] Storing/quickly accessing "genome length" data.

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Feb 9 22:08:02 CET 2011


Hi,

I guess a lot of us have this problem: I'm storing "genome long"
integer/doubles vectors for each position along each chromosome.

I want to quickly access parts of these vectors in a manner quite
similar/convenient/efficient to how we can quickly access the reads in
a given region of a BAM file. I'm curios what you folks are using to
store this type of info?

Currently I just have RData objects of Rle's or XIntegers, etc. for
each strand of each chromosome. I'll load these data files, query the
info over the ranges I want, then junk the (usually large) vector I
just loaded. It's not the best, but it works.

In the bioinformatics world, I guess these data are best stored as
bigWig files, yes? And AFAIK, there's no (convenient or otherwise) way
to query bigWigs from within R/Bioc, right?

Then I wonder if storing these in hdf/netcdf files isn't actually the
way to go  ... and if so, why not go whole-hog and work on a bioc
interface to the somehow-defined biohdf format?

Any thoughts?

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list