[BioC] matrix like object with Rle columns

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Tue Jun 26 06:11:53 CEST 2012


On Mon, Jun 25, 2012 at 11:56 PM, Kasper Daniel Hansen
<kasperdanielhansen at gmail.com> wrote:
> On Mon, Jun 25, 2012 at 11:36 PM, Michael Lawrence
> <lawrence.michael at gene.com> wrote:
>> Patrick and I had talked about this a long time ago (essentially putting a
>> "dim" attribute on an Rle), but the closest thing today is a DataFrame with
>> Rle columns.
>>
>> Use case?
>
> Say I have whole-genome data (for example coverage)  on multiple
> samples.  Usually, this is far easier to think of as a matrix (in my
> opinion) with ~3B rows and I often want to do rowSums(), colSums() etc
> (in fact, probably the whole API from matrixStats).  This is
> especially nice when you have multiple coverage-like tracks on each
> sample, so you could have
>  trackA : genome by samples
>  trackB : genome by samples
>  ...
>
> You could think of this as a SummarizedExperiment, but with
> _extremely_ big matrices in the assay slot.
>
> I want to take advantage of the Rle structure to store the data more
> efficiently and also to do potentially faster computations.
>
> This is actually closer to my use case where I currently use matrices
> with ~30M rows (which works fine), but I would like to expand to ~800M
> rows (which would suck a bit).
>
> You could also think of a matrix-like object with Rle columns as an
> alternative sparse matrix structure.  In a typical sparse matrix you
> only store the non-zero entities, here we only store the
> change-points.  Depending on the structure of the matrix this could be
> an efficient storage of an otherwise dense matrix.
>
> So essentially, what I want, is to have mathematical operations on
> this object, where I would utilize that I know that all entities are
> numbers so the typical matrix operations makes sense.
>
> [ side question which could be relevant in this discussion: for a
> numeric Rle is there some notion of precision - say I have truly
> numeric values with tons of digits, and I want to consider two numbers
> part of the same run if |x1 -x2|<epsilon? ]

You can see that Pete has had similar thoughts in
genoset/R/DataFrame-methods.R, although he only has colMeans (which is
the easy one).

Kasper

> Kasper
>
>>
>> Michael
>>
>> On Mon, Jun 25, 2012 at 8:27 PM, Kasper Daniel Hansen
>> <kasperdanielhansen at gmail.com> wrote:
>>>
>>> Do we have a matrix-like object, but where the columns are Rle's?
>>>
>>> Kasper
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>



More information about the Bioconductor mailing list