[BioC] how to merge GRanges objects

Hervé Pagès hpages at fhcrc.org
Wed Oct 16 20:00:28 CEST 2013



On 10/16/2013 10:54 AM, Hervé Pagès wrote:
> Hi John,
>
> On 10/16/2013 07:15 AM, John linux-user wrote:
>> Hello everyone,
>>
>> I am wondering how to simply merge two GRanges objects by range field
>> and add the value by additional vector. For example, I have two
>> objects below
>>
>> obj1
>>
>> seqnames           ranges strand |       Val
>>              <Rle>        <IRanges>  <Rle> | <integer>
>>    [1] chr1_random [272531, 272571]      + |        88
>>    [2] chr1_random [272871, 272911]      + |        45
>>
>> obj2
>>   seqnames           ranges strand |       Val
>>              <Rle>        <IRanges>  <Rle> | <integer>
>>    [1] chr1_random [272531, 272581]      + |        800
>>    [2] chr1_random [272850, 272911]      + |        450
>>
>> after merged, it should be an object as the following mergedObject and
>> it would concern the differences in IRANGE data (e.g. 581 and 850 in
>> obj2 above were different from those of obj1, which were 571 and 871
>> respectively)
>>
>> mergedObject
>>
>>   seqnames           ranges strand                 |
>> object2Val   object1Val
>>              <Rle>        <IRanges>  <Rle>         |
>> <integer>     <integer>
>>    [1] chr1_random [272531, 272581]      + |        800               88
>>    [2] chr1_random [272850, 272911]      + |        450               45
>>
>
> I fail to see how this result makes sense. If you think of the "val"
> metadata column as a numerical variable defined along the genome, then,
> in the merged object, it takes the value 88 over the [272531, 272581]
> interval but in original object 'obj1' it was taking this value
> over the [272531, 272571] interval.
>
> The following merged object would make much more sense to me:
>
>           seqnames          ranges  strand | object2Val object1Val
>              <Rle>        <IRanges>  <Rle> |  <integer>  <integer>
>    [1] chr1_random [272531, 272571]      + |        800         88
>    [1] chr1_random [272572, 272581]      + |        800         NA
>    [2] chr1_random [272850, 272870]      + |        450         NA
>    [2] chr1_random [272871, 272911]      + |        450         45
>
> Sounds like you need all the ranges in disjoin(union(obj1, obj2))

I meant disjoin(c(obj1, obj2)) here, sorry.

H.

> if you want to be able to represent the 2 numerical variables
> accurately. But maybe I'm missing completely what you're trying
> to achieve.
>
> H.
>
>>
>>
>>
>> On Tuesday, October 15, 2013 12:36 PM, Lukasz [guest]
>> <guest at bioconductor.org> wrote:
>>
>>
>> Hi!
>>
>> Problem summary: How to retrieve part of the sequence of mRNA around
>> given location.
>>
>> I have the locations of the binding to mRNA events as GRanges
>> (GRevents) and need to retrieve sequence for motif finding. The
>> problem is that if I use getSeq(flank(GRevents, width=n)) then I get
>> the genomic sequence not transcript sequence, i.e. not accounting for
>> introns or mRNA border. I have tried solving it with
>> exonsBy("transcriptDb object", "tx") function but without success.
>>
>> Question: Is there a bioconductor-supported way of getting resolving
>> the problem? With CLIPseq being more and more popular this will be
>> very demanded function.
>>
>> Thanks,
>> Lukasz
>>
>> -- output of sessionInfo():
>>
>>> sessionInfo()
>> R version 2.15.0 (2012-03-30)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C                 LC_NAME=C
>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] rtracklayer_1.16.0                 GenomicFeatures_1.8.1
>> [3] AnnotationDbi_1.18.4               Biobase_2.16.0
>> [5] BSgenome.Mmusculus.UCSC.mm9_1.3.17 BSgenome_1.24.0
>> [7] Biostrings_2.24.1                  GenomicRanges_1.8.3
>> [9] IRanges_1.14.2                     BiocGenerics_0.2.0
>>
>> loaded via a namespace (and not attached):
>> [1] biomaRt_2.12.0  bitops_1.0-4.1  DBI_0.2-5       RCurl_1.91-1
>> [5] Rsamtools_1.8.0 RSQLite_0.11.1  stats4_2.15.0   tools_2.15.0
>> [9] XML_3.9-4       zlibbioc_1.2.0
>>
>>
>> --
>> Sent via the guest posting facility at bioconductor.org.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>     [[alternative HTML version deleted]]
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list