[BioC] countOverlaps() only count positive strand (from GenomicRanges example)

Song Li songli116 at gmail.com
Mon Dec 13 21:13:34 CET 2010


Hi Martin and Vincent,

Thank you for the replies and both methods work,

Cheers,
Song

On Mon, Dec 13, 2010 at 2:19 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> On 12/13/2010 10:09 AM, Song Li wrote:
>> Hi Martin and Vincent,
>>
>> Thank you for your replies, here is my code:
>>
>>> strand(exonRanges)<-"*"
>> Error in `strand<-`(`*tmp*`, value = "*") :
>>   replacement 'value' is not an AtomicList with the same elementLengths as 'x'
>
> create a list of character vectors, each of the appropriate length
>
>  value <- lapply(elementLengths(exonRanges), rep, x="*")
>
> convert this list to a class derived from 'AtomicList', and do the
> replacement
>
>  strand(exonRanges) <- CharacterList(value)
>
> Sometimes it's easier / faster to work with exons(txdb, column="tx_id")
> or similar.
>
> Martin
>
>
>>
>>> levels(strand(aligns))<-c('*','*','*')
>> Error in function (classes, fdef, mtable)  :
>>   unable to find an inherited method for function "strand<-", for
>> signature "GappedAlignments"
>>
>> It does not seem to be that straightforward.
>>
>> I have been searching for ways to make modification to the
>> GappedAlignments and GRangesList  objects.
>>
>> Song
>>
>> On Mon, Dec 13, 2010 at 12:39 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>>> On 12/13/2010 09:30 AM, Vincent Carey wrote:
>>>> On Mon, Dec 13, 2010 at 12:06 PM, Song Li <songli116 at gmail.com> wrote:
>>>>> Hi All,
>>>>>
>>>>> I was following the example in the "GenomicRanges Use Cases", section
>>>>> 3, "Simple RNA-seq Analysis", by "copy-paste" command from the
>>>>> document from page 7 to page 8:
>>>>>
>>>>> What I found out is that countOverlap() only count reads aligned to
>>>>> positive strand.  There are 28 reads map to region GRList[6645], 13 on
>>>>> positive strand, 15 on negative strand.  The "count[6645]" is 13.
>>>>>
>>>>> Is there a way to count both strand?
>>>>
>>>> If you want to disregard strand of exon range, set it to "*".  This
>>>> does not appear to be a one-liner.
>>>
>>> it's tricky to set strand() on a GappedAlignment (because the CIGAR has
>>> to be adjusted appropriately) but the I believe that the converse
>>>
>>>  strand(exonRanges) <- "*"
>>>
>>> accomplishes the same goal -- reads are counted without regard to strand
>>> of alignment.
>>>
>>> Martin
>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Song Li
>>>>>
>>>>> Here are all the code:
>>>>> #-----> Start with code from the package description<-------#
>>>>>> library(Rsamtools)
>>>>>> testFile <- system.file("bam", "isowt5_13e.bam", package = "leeBamViews")
>>>>>> aligns <- readBamGappedAlignments(testFile)
>>>>>> library(GenomicFeatures)
>>>>>> txdb <- makeTranscriptDbFromUCSC(genome = "sacCer2", tablename = "sgdGene")
>>>>>> exonRanges <- exonsBy(txdb, "tx")
>>>>>> length(exonRanges)
>>>>>> levels(rname(aligns)) <- c(paste("chr", as.roman(1:16),
>>>>> +     sep = ""), "chrM")
>>>>>> counts <- countOverlaps(exonRanges, aligns)
>>>>>
>>>>> #-----> find alignments that map to GRList$6645<-------#
>>>>>> aligns[705:732]
>>>>> GappedAlignments of length 28
>>>>>       rname strand cigar qwidth  start    end width ngap
>>>>> [1]  chrXIII      +   36M     36 804443 804478    36    0
>>>>> [2]  chrXIII      +   36M     36 804466 804501    36    0
>>>>> [3]  chrXIII      -   36M     36 804473 804508    36    0
>>>>> [4]  chrXIII      +   36M     36 804476 804511    36    0
>>>>> [5]  chrXIII      -   36M     36 804493 804528    36    0
>>>>> [6]  chrXIII      +   36M     36 804516 804551    36    0
>>>>> [7]  chrXIII      +   36M     36 804562 804597    36    0
>>>>> [8]  chrXIII      +   36M     36 804562 804597    36    0
>>>>> [9]  chrXIII      -   36M     36 804574 804609    36    0
>>>>> ...      ...    ...   ...    ...    ...    ...   ...  ...
>>>>> [20] chrXIII      +   36M     36 804764 804799    36    0
>>>>> [21] chrXIII      -   36M     36 804798 804833    36    0
>>>>> [22] chrXIII      +   36M     36 804799 804834    36    0
>>>>> [23] chrXIII      +   36M     36 804799 804834    36    0
>>>>> [24] chrXIII      -   36M     36 804816 804851    36    0
>>>>> [25] chrXIII      -   36M     36 804947 804982    36    0
>>>>> [26] chrXIII      -   36M     36 804953 804988    36    0
>>>>> [27] chrXIII      -   36M     36 804955 804990    36    0
>>>>> [28] chrXIII      -   36M     36 804974 805009    36    0
>>>>>> exonRanges[6646]
>>>>> GRangesList of length 1
>>>>> $6646
>>>>> GRanges with 1 range and 3 elementMetadata values
>>>>>    seqnames           ranges strand |   exon_id   exon_name exon_rank
>>>>>       <Rle>        <IRanges>  <Rle> | <integer> <character> <integer>
>>>>> [1]  chrXIII [804455, 805090]      + |      7011          NA         1
>>>>>
>>>>>
>>>>> seqlengths
>>>>>   chrIV   chrXV  chrVII  chrXII  chrXVI ...   chrVI    chrI    chrM 2micron
>>>>>  1531919 1091289 1090947 1078175  948062 ...  270148  230208   85779    6318
>>>>>> counts[6646]
>>>>> [1] 13
>>>>>> table(strand(aligns[705:732]))
>>>>>
>>>>>  +  -
>>>>> 13 15
>>>>>
>>>>>
>>>>>> sessionInfo()
>>>>> R version 2.12.0 (2010-10-15)
>>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>>
>>>>> locale:
>>>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>
>>>>> attached base packages:
>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>
>>>>> other attached packages:
>>>>> [1] Rsamtools_1.2.1       Biostrings_2.18.0     GenomicFeatures_1.2.3
>>>>> [4] GenomicRanges_1.2.1   IRanges_1.8.2
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] Biobase_2.10.0     biomaRt_2.6.0      BSgenome_1.18.1    DBI_0.2-5
>>>>> [5] RCurl_1.4-3        RSQLite_0.9-3      rtracklayer_1.10.5 tools_2.12.0
>>>>> [9] XML_3.2-0
>>>>>
>>>>> Song Li
>>>>> --
>>>>> Postdoctoral Associate
>>>>> Institute for Genome Sciences and Policy
>>>>> Duke University
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>> --
>>> Computational Biology
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>>>
>>> Location: M1-B861
>>> Telephone: 206 667-2793
>>>
>>
>>
>>
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>



-- 
Postdoctoral Associate
Institute for Genome Sciences and Policy
Duke University



More information about the Bioconductor mailing list