[BioC] Summing Views on coverage by base

Sean Davis sdavis2 at mail.nih.gov
Tue Mar 20 23:10:06 CET 2012


2012/3/20 Martin Morgan <mtmorgan at fhcrc.org>:
> On 03/20/2012 01:40 PM, Hervé Pagès wrote:
>> Hi Sean,
>>
>> On 03/20/2012 01:14 PM, Sean Davis wrote:
>>> I have a set of Views of equal width (think upstream of tss) and want
>>> to sum each base across those views. I can extract each view as an
>>> integer vector and create a matrix, but this matrix can get pretty
>>> large. I'm missing the skills with SimpleRleViewsList, though, to
>>> work directly on at object. Any suggestions?
>>
>>  > subject <- Rle(rep(c(0L, 1L, 3L, 2L, 18L, 0L), c(3,2,1,5,2,4)))
>>  > myViews <- Views(subject, start=4:11, width=5)
>>  > myViews
>> Views on a 17-length Rle subject
>>
>> views:
>> start end width
>> [1] 4 8 5 [1 1 3 2 2]
>> [2] 5 9 5 [1 3 2 2 2]
>> [3] 6 10 5 [3 2 2 2 2]
>> [4] 7 11 5 [2 2 2 2 2]
>> [5] 8 12 5 [ 2 2 2 2 18]
>> [6] 9 13 5 [ 2 2 2 18 18]
>> [7] 10 14 5 [ 2 2 18 18 0]
>> [8] 11 15 5 [ 2 18 18 0 0]
>>
>> This maybe would be fast enough if you don't have too many columns:
>>
>> viewColSums <- function(x)
>> {
>> sapply(seq_len(width(x)[1L]),
>> function(i)
>> sum(subject[start(x)+i-1L]))
>> }
>>
>>  > viewColSums(myViews)
>> [1] 15 32 49 46 44
>
>
> Reduce("+", myViews)

Nice.

Sean


>>
>> Then if your SimpleRleViewsList object is not too long (1 elt per
>> chromosome?), you can sapply( , viewColSums) on it.
>>
>> Maybe we should make viewColSums the "colSums" method for RleViews
>> objects? (and eventually implement it in C?)
>>
>> Cheers,
>> H.
>>
>>>
>>> Thanks,
>>> Sean
>>>
>>>> sessionInfo()
>>> R Under development (unstable) (2012-01-19 r58141)
>>> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>>>
>>> locale:
>>> [1] C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] GenomicRanges_1.7.30 IRanges_1.13.28 BiocGenerics_0.1.12
>>>
>>> loaded via a namespace (and not attached):
>>> [1] stats4_2.15.0 tools_2.15.0
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793



More information about the Bioconductor mailing list