[BioC] countOverlaps() only count positive strand (from GenomicRanges example)

Martin Morgan mtmorgan at fhcrc.org
Mon Dec 13 18:39:53 CET 2010


On 12/13/2010 09:30 AM, Vincent Carey wrote:
> On Mon, Dec 13, 2010 at 12:06 PM, Song Li <songli116 at gmail.com> wrote:
>> Hi All,
>>
>> I was following the example in the "GenomicRanges Use Cases", section
>> 3, "Simple RNA-seq Analysis", by "copy-paste" command from the
>> document from page 7 to page 8:
>>
>> What I found out is that countOverlap() only count reads aligned to
>> positive strand.  There are 28 reads map to region GRList[6645], 13 on
>> positive strand, 15 on negative strand.  The "count[6645]" is 13.
>>
>> Is there a way to count both strand?
> 
> If you want to disregard strand of exon range, set it to "*".  This
> does not appear to be a one-liner.

it's tricky to set strand() on a GappedAlignment (because the CIGAR has
to be adjusted appropriately) but the I believe that the converse

  strand(exonRanges) <- "*"

accomplishes the same goal -- reads are counted without regard to strand
of alignment.

Martin

> 
>>
>> Thanks,
>> Song Li
>>
>> Here are all the code:
>> #-----> Start with code from the package description<-------#
>>> library(Rsamtools)
>>> testFile <- system.file("bam", "isowt5_13e.bam", package = "leeBamViews")
>>> aligns <- readBamGappedAlignments(testFile)
>>> library(GenomicFeatures)
>>> txdb <- makeTranscriptDbFromUCSC(genome = "sacCer2", tablename = "sgdGene")
>>> exonRanges <- exonsBy(txdb, "tx")
>>> length(exonRanges)
>>> levels(rname(aligns)) <- c(paste("chr", as.roman(1:16),
>> +     sep = ""), "chrM")
>>> counts <- countOverlaps(exonRanges, aligns)
>>
>> #-----> find alignments that map to GRList$6645<-------#
>>> aligns[705:732]
>> GappedAlignments of length 28
>>       rname strand cigar qwidth  start    end width ngap
>> [1]  chrXIII      +   36M     36 804443 804478    36    0
>> [2]  chrXIII      +   36M     36 804466 804501    36    0
>> [3]  chrXIII      -   36M     36 804473 804508    36    0
>> [4]  chrXIII      +   36M     36 804476 804511    36    0
>> [5]  chrXIII      -   36M     36 804493 804528    36    0
>> [6]  chrXIII      +   36M     36 804516 804551    36    0
>> [7]  chrXIII      +   36M     36 804562 804597    36    0
>> [8]  chrXIII      +   36M     36 804562 804597    36    0
>> [9]  chrXIII      -   36M     36 804574 804609    36    0
>> ...      ...    ...   ...    ...    ...    ...   ...  ...
>> [20] chrXIII      +   36M     36 804764 804799    36    0
>> [21] chrXIII      -   36M     36 804798 804833    36    0
>> [22] chrXIII      +   36M     36 804799 804834    36    0
>> [23] chrXIII      +   36M     36 804799 804834    36    0
>> [24] chrXIII      -   36M     36 804816 804851    36    0
>> [25] chrXIII      -   36M     36 804947 804982    36    0
>> [26] chrXIII      -   36M     36 804953 804988    36    0
>> [27] chrXIII      -   36M     36 804955 804990    36    0
>> [28] chrXIII      -   36M     36 804974 805009    36    0
>>> exonRanges[6646]
>> GRangesList of length 1
>> $6646
>> GRanges with 1 range and 3 elementMetadata values
>>    seqnames           ranges strand |   exon_id   exon_name exon_rank
>>       <Rle>        <IRanges>  <Rle> | <integer> <character> <integer>
>> [1]  chrXIII [804455, 805090]      + |      7011          NA         1
>>
>>
>> seqlengths
>>   chrIV   chrXV  chrVII  chrXII  chrXVI ...   chrVI    chrI    chrM 2micron
>>  1531919 1091289 1090947 1078175  948062 ...  270148  230208   85779    6318
>>> counts[6646]
>> [1] 13
>>> table(strand(aligns[705:732]))
>>
>>  +  -
>> 13 15
>>
>>
>>> sessionInfo()
>> R version 2.12.0 (2010-10-15)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] Rsamtools_1.2.1       Biostrings_2.18.0     GenomicFeatures_1.2.3
>> [4] GenomicRanges_1.2.1   IRanges_1.8.2
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.10.0     biomaRt_2.6.0      BSgenome_1.18.1    DBI_0.2-5
>> [5] RCurl_1.4-3        RSQLite_0.9-3      rtracklayer_1.10.5 tools_2.12.0
>> [9] XML_3.2-0
>>
>> Song Li
>> --
>> Postdoctoral Associate
>> Institute for Genome Sciences and Policy
>> Duke University
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list