[BioC] matrix like object with Rle columns

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Tue Jun 26 05:56:11 CEST 2012


On Mon, Jun 25, 2012 at 11:36 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
> Patrick and I had talked about this a long time ago (essentially putting a
> "dim" attribute on an Rle), but the closest thing today is a DataFrame with
> Rle columns.
>
> Use case?

Say I have whole-genome data (for example coverage)  on multiple
samples.  Usually, this is far easier to think of as a matrix (in my
opinion) with ~3B rows and I often want to do rowSums(), colSums() etc
(in fact, probably the whole API from matrixStats).  This is
especially nice when you have multiple coverage-like tracks on each
sample, so you could have
  trackA : genome by samples
  trackB : genome by samples
  ...

You could think of this as a SummarizedExperiment, but with
_extremely_ big matrices in the assay slot.

I want to take advantage of the Rle structure to store the data more
efficiently and also to do potentially faster computations.

This is actually closer to my use case where I currently use matrices
with ~30M rows (which works fine), but I would like to expand to ~800M
rows (which would suck a bit).

You could also think of a matrix-like object with Rle columns as an
alternative sparse matrix structure.  In a typical sparse matrix you
only store the non-zero entities, here we only store the
change-points.  Depending on the structure of the matrix this could be
an efficient storage of an otherwise dense matrix.

So essentially, what I want, is to have mathematical operations on
this object, where I would utilize that I know that all entities are
numbers so the typical matrix operations makes sense.

[ side question which could be relevant in this discussion: for a
numeric Rle is there some notion of precision - say I have truly
numeric values with tons of digits, and I want to consider two numbers
part of the same run if |x1 -x2|<epsilon? ]

Kasper

>
> Michael
>
> On Mon, Jun 25, 2012 at 8:27 PM, Kasper Daniel Hansen
> <kasperdanielhansen at gmail.com> wrote:
>>
>> Do we have a matrix-like object, but where the columns are Rle's?
>>
>> Kasper
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list