[BioC] Summing Views on coverage by base

Sean Davis sdavis2 at mail.nih.gov
Tue Mar 20 23:56:06 CET 2012


On Tue, Mar 20, 2012 at 6:32 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> On 03/20/2012 03:10 PM, Sean Davis wrote:
>>
>> 2012/3/20 Martin Morgan<mtmorgan at fhcrc.org>:
>>>
>>> On 03/20/2012 01:40 PM, Hervé Pagès wrote:
>>>>
>>>> Hi Sean,
>>>>
>>>> On 03/20/2012 01:14 PM, Sean Davis wrote:
>>>>>
>>>>> I have a set of Views of equal width (think upstream of tss) and want
>>>>> to sum each base across those views. I can extract each view as an
>>>>> integer vector and create a matrix, but this matrix can get pretty
>>>>> large. I'm missing the skills with SimpleRleViewsList, though, to
>>>>> work directly on at object. Any suggestions?
>>>>
>>>>
>>>>  >  subject<- Rle(rep(c(0L, 1L, 3L, 2L, 18L, 0L), c(3,2,1,5,2,4)))
>>>>  >  myViews<- Views(subject, start=4:11, width=5)
>>>>  >  myViews
>>>> Views on a 17-length Rle subject
>>>>
>>>> views:
>>>> start end width
>>>> [1] 4 8 5 [1 1 3 2 2]
>>>> [2] 5 9 5 [1 3 2 2 2]
>>>> [3] 6 10 5 [3 2 2 2 2]
>>>> [4] 7 11 5 [2 2 2 2 2]
>>>> [5] 8 12 5 [ 2 2 2 2 18]
>>>> [6] 9 13 5 [ 2 2 2 18 18]
>>>> [7] 10 14 5 [ 2 2 18 18 0]
>>>> [8] 11 15 5 [ 2 18 18 0 0]
>>>>
>>>> This maybe would be fast enough if you don't have too many columns:
>>>>
>>>> viewColSums<- function(x)
>>>> {
>>>> sapply(seq_len(width(x)[1L]),
>>>> function(i)
>>>> sum(subject[start(x)+i-1L]))
>>>> }
>>>>
>>>>  >  viewColSums(myViews)
>>>> [1] 15 32 49 46 44
>>>
>>>
>>>
>>> Reduce("+", myViews)
>>
>>
>> Nice.
>
>
> maybe too many class coercions in there, though
>
>  Reduce(function(x, y) x + as.integer(y), myViews, 0L)
>
> which  seems like a step or so back...

But 30% faster.

Thanks again.

Sean

>>
>> Sean
>>
>>
>>>>
>>>> Then if your SimpleRleViewsList object is not too long (1 elt per
>>>> chromosome?), you can sapply( , viewColSums) on it.
>>>>
>>>> Maybe we should make viewColSums the "colSums" method for RleViews
>>>> objects? (and eventually implement it in C?)
>>>>
>>>> Cheers,
>>>> H.
>>>>
>>>>>
>>>>> Thanks,
>>>>> Sean
>>>>>
>>>>>> sessionInfo()
>>>>>
>>>>> R Under development (unstable) (2012-01-19 r58141)
>>>>> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>>>>>
>>>>> locale:
>>>>> [1] C
>>>>>
>>>>> attached base packages:
>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>>
>>>>> other attached packages:
>>>>> [1] GenomicRanges_1.7.30 IRanges_1.13.28 BiocGenerics_0.1.12
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] stats4_2.15.0 tools_2.15.0
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Computational Biology
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>>
>>> Location: M1-B861
>>> Telephone: 206 667-2793
>
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list