[BioC] Summing Views on coverage by base

Martin Morgan mtmorgan at fhcrc.org
Tue Mar 20 23:32:26 CET 2012


On 03/20/2012 03:10 PM, Sean Davis wrote:
> 2012/3/20 Martin Morgan<mtmorgan at fhcrc.org>:
>> On 03/20/2012 01:40 PM, Hervé Pagès wrote:
>>> Hi Sean,
>>>
>>> On 03/20/2012 01:14 PM, Sean Davis wrote:
>>>> I have a set of Views of equal width (think upstream of tss) and want
>>>> to sum each base across those views. I can extract each view as an
>>>> integer vector and create a matrix, but this matrix can get pretty
>>>> large. I'm missing the skills with SimpleRleViewsList, though, to
>>>> work directly on at object. Any suggestions?
>>>
>>>   >  subject<- Rle(rep(c(0L, 1L, 3L, 2L, 18L, 0L), c(3,2,1,5,2,4)))
>>>   >  myViews<- Views(subject, start=4:11, width=5)
>>>   >  myViews
>>> Views on a 17-length Rle subject
>>>
>>> views:
>>> start end width
>>> [1] 4 8 5 [1 1 3 2 2]
>>> [2] 5 9 5 [1 3 2 2 2]
>>> [3] 6 10 5 [3 2 2 2 2]
>>> [4] 7 11 5 [2 2 2 2 2]
>>> [5] 8 12 5 [ 2 2 2 2 18]
>>> [6] 9 13 5 [ 2 2 2 18 18]
>>> [7] 10 14 5 [ 2 2 18 18 0]
>>> [8] 11 15 5 [ 2 18 18 0 0]
>>>
>>> This maybe would be fast enough if you don't have too many columns:
>>>
>>> viewColSums<- function(x)
>>> {
>>> sapply(seq_len(width(x)[1L]),
>>> function(i)
>>> sum(subject[start(x)+i-1L]))
>>> }
>>>
>>>   >  viewColSums(myViews)
>>> [1] 15 32 49 46 44
>>
>>
>> Reduce("+", myViews)
>
> Nice.

maybe too many class coercions in there, though

   Reduce(function(x, y) x + as.integer(y), myViews, 0L)

which  seems like a step or so back...

Martin

>
> Sean
>
>
>>>
>>> Then if your SimpleRleViewsList object is not too long (1 elt per
>>> chromosome?), you can sapply( , viewColSums) on it.
>>>
>>> Maybe we should make viewColSums the "colSums" method for RleViews
>>> objects? (and eventually implement it in C?)
>>>
>>> Cheers,
>>> H.
>>>
>>>>
>>>> Thanks,
>>>> Sean
>>>>
>>>>> sessionInfo()
>>>> R Under development (unstable) (2012-01-19 r58141)
>>>> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>>>>
>>>> locale:
>>>> [1] C
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] GenomicRanges_1.7.30 IRanges_1.13.28 BiocGenerics_0.1.12
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] stats4_2.15.0 tools_2.15.0
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>
>>
>> --
>> Computational Biology
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>
>> Location: M1-B861
>> Telephone: 206 667-2793


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list