[BioC] multiple hits with countOverlaps function

Wei Shi shi at wehi.EDU.AU
Thu Apr 14 01:43:41 CEST 2011


Hi,

	I am using countOverlaps function in GenomicFeatures package to count how many sequencing reads are mapped to each genomic feature (e.g. promoter region, gene region etc.). However I found that if a read was mapped to more than one features, then the count of each feature will be increased by 1, i.e. this read was counted more than one times although it can only originate from one genomic location. Below is a simple example to show this:

-------------------------
features <- GRanges(seqnames="chr1", ranges=IRanges(c(100,500),c(1000,1500)) ,strand=c("+","+"))
features
GRanges with 2 ranges and 0 elementMetadata values
    seqnames      ranges strand |
       <Rle>   <IRanges>  <Rle> |
[1]     chr1 [100, 1000]      + |
[2]     chr1 [500, 1500]      + |

seqlengths
 chr1
   NA
reads <- GRanges(seqnames="chr1",ranges=IRanges(c(150,600),c(250,700)),strand=c("+","+"))
GRanges with 2 ranges and 0 elementMetadata values
    seqnames     ranges strand |
       <Rle>  <IRanges>  <Rle> |
[1]     chr1 [150, 250]      + |
[2]     chr1 [600, 700]      + |

seqlengths
 chr1
   NA
countOverlaps(features,reads)
[1] 2 1
-------------------- 

I think It will be a better to have an argument in countOverlaps function which specifies whether a single hit (eg. a random one) or multiple hits should be returned for each query sequence. 

Below is my session info:

sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicFeatures_1.2.3 GenomicRanges_1.2.3   IRanges_1.8.9        

loaded via a namespace (and not attached):
 [1] Biobase_2.10.0     biomaRt_2.5.1      Biostrings_2.18.2  BSgenome_1.18.3    DBI_0.2-5          RCurl_1.4-3       
 [7] RSQLite_0.9-2      rtracklayer_1.10.6 tools_2.12.0       XML_3.2-0       


Cheers,
Wei

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}



More information about the Bioconductor mailing list