[BioC] countOverlaps within Mode Counting

Valerie Obenchain vobencha at fhcrc.org
Thu Mar 1 22:49:35 CET 2012


Hi Dario,

I'm not sure I understand your question.  Are you after a summary by 
gene of the number of reads that fall 'within' each gene?

 > genes <- GRanges("chr1", IRanges(c( 1, 19), c(10, 30)))
 > reads <- GRanges("chr1", IRanges(c(2, 20, 22), width=5))
 > fo <- findOverlaps(reads, genes, type="within")
 > as.matrix(fo)
      queryHits subjectHits
[1,]         1           1
[2,]         2           2
[3,]         3           2
 > split(queryHits(fo), subjectHits(fo))
$`1`
[1] 1

$`2`
[1] 2 3


Do you really want to know if a gene falls 'within' a short read or are 
you asking if that is a reasonable use case?

Valerie



On 02/29/2012 04:00 AM, Dario Strbenac wrote:
> When using countOverlaps, it makes counts for each query range, using the subjects. I find the definition of the type = "within" setting non-intuitive. It only counts a subject range if it has a query within it. But since countOverlaps gives one count per query, it seems natural to have gene coordinates as the query object, and short read coordinates as the subject coordinates. Then, using "within" is equivalent to asking how many times each gene is wholly within a short read. Is this a meaningful calculation for other use cases than the one I have made ? Could it be extended to work for the genes and reads scenario ?
>
> --------------------------------------
> Dario Strbenac
> Research Assistant
> Cancer Epigenetics
> Garvan Institute of Medical Research
> Darlinghurst NSW 2010
> Australia
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list