[BioC] Summing Views on coverage by base

Hervé Pagès hpages at fhcrc.org
Tue Mar 20 21:40:35 CET 2012


Hi Sean,

On 03/20/2012 01:14 PM, Sean Davis wrote:
> I have a set of Views of equal width (think upstream of tss) and want
> to sum each base across those views.  I can extract each view as an
> integer vector and create a matrix, but this matrix can get pretty
> large.  I'm missing the skills with SimpleRleViewsList, though, to
> work directly on at object.  Any suggestions?

 > subject <- Rle(rep(c(0L, 1L, 3L, 2L, 18L, 0L), c(3,2,1,5,2,4)))
 > myViews <- Views(subject, start=4:11, width=5)
 > myViews
Views on a 17-length Rle subject

views:
     start end width
[1]     4   8     5 [1 1 3 2 2]
[2]     5   9     5 [1 3 2 2 2]
[3]     6  10     5 [3 2 2 2 2]
[4]     7  11     5 [2 2 2 2 2]
[5]     8  12     5 [ 2  2  2  2 18]
[6]     9  13     5 [ 2  2  2 18 18]
[7]    10  14     5 [ 2  2 18 18  0]
[8]    11  15     5 [ 2 18 18  0  0]

This maybe would be fast enough if you don't have too many columns:

viewColSums <- function(x)
{
     sapply(seq_len(width(x)[1L]),
            function(i)
                sum(subject[start(x)+i-1L]))
}

 > viewColSums(myViews)
[1] 15 32 49 46 44

Then if your SimpleRleViewsList object is not too long (1 elt per
chromosome?), you can sapply( , viewColSums) on it.

Maybe we should make viewColSums the "colSums" method for RleViews
objects? (and eventually implement it in C?)

Cheers,
H.

>
> Thanks,
> Sean
>
>> sessionInfo()
> R Under development (unstable) (2012-01-19 r58141)
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] GenomicRanges_1.7.30 IRanges_1.13.28      BiocGenerics_0.1.12
>
> loaded via a namespace (and not attached):
> [1] stats4_2.15.0 tools_2.15.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list