[BioC] merging GRange objects

Nair, Murlidharan T mnair at iusb.edu
Tue Jun 25 03:00:50 CEST 2013


Hi Hervé ,
I am annotating paired end reads and code is for mate1, mate2 is the same. As you can see I am using the genomic coordinates and trying to annotate them using the UCSC known genes table. I need to ultimately make an association with the coordinates, a reason why I am trying to merge the outputs. I am converting them into data frames and merging them because select returns a data frame, so I have to convert the GRanges object to a data frame to merge them.  I want to make sure is that I am not messing my data when I am merging them. Would the following lines correctly combine them? 

> mrg.data1=merge(as.data.frame(trans.names), as.data.frame(trans.info), by.x="ENTREZID", by.y="GENEID")

> mrg.data2=merge(mrg.data1, as.data.frame(codingRegions), by.x="TXID",   by.y="TXID")

When I reviewed the first few lines and they seemed ok, but there could always be exceptions. If there is a better way please let me know. I am very new to Bioconductor. 

Thanks for your help. 

Cheers../Murli
 

-----Original Message-----
From: Hervé Pagès [mailto:hpages at fhcrc.org] 
Sent: Monday, June 24, 2013 8:29 PM
To: Murli [guest]
Cc: bioconductor at r-project.org; Nair, Murlidharan T
Subject: Re: [BioC] merging GRange objects

Hi Murli,

On 06/24/2013 04:55 PM, Murli [guest] wrote:
>
> Hi,
>
> I would appreciate if you could tell me if the way I am merging the GappedAlignments object and GRanges objects is correct. mate1 and mate2 are GappedAlignment objects. I am merging them in order to associate my reads with the annotation.
>
> txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
>
> mate.range= 
> GRanges(seqnames(mate1[isSameCzome]),IRanges(start(mate1[isSameCzome])
> -offset,start(mate1[isSameCzome])+offset))
>
> codingRegions = refLocsToLocalLocs(mate.range, txdb)
>
> trans.info=select(txdb, key=values(codingRegions)$TXID, 
> cols=c("GENEID","TXNAME"), keytype="TXID")
>
> trans.names=select(org.Hs.eg.db, trans.info$GENEID, c("GENENAME", 
> "SYMBOL"))
>
> mrg.data1=merge(as.data.frame(trans.names), as.data.frame(trans.info), 
> by.x="ENTREZID", by.y="GENEID")
>
> mrg.data2=merge(mrg.data1, as.data.frame(codingRegions), by.x="TXID", 
> by.y="TXID")

It looks like you are merging data.frames, not GappedAlignments or GRanges objects. Also you say that 'mate1' and 'mate2' are GappedAlignments objects but I only see 'mate1' in the above code.

The exact meaning of "merging" depends on the objects involved.
Sometimes people use the term "merging" when they actually want to combine or bind objects together with c(), rbind() or cbind().
Note that since GRanges and GappedAlignments objects are conceptually vector-like objects of dimension 1, only c() works on them. That is, rbind(), cbind(), and merge() (which are typically operating on 2-D
objects) are not supported on those objects.

Cheers,
H.

>
> Thanks ../Murli
>
>
>
>   -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
>   [1] biomaRt_2.16.0
>   [2] org.Hs.eg.db_2.9.0
>   [3] RSQLite_0.11.4
>   [4] DBI_0.2-7
>   [5] VariantAnnotation_1.6.6
>   [6] Rsamtools_1.12.3
>   [7] BSgenome.Hsapiens.UCSC.hg19_1.3.19
>   [8] BSgenome_1.28.0
>   [9] Biostrings_2.28.0
> [10] TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.2
> [11] GenomicFeatures_1.12.2
> [12] AnnotationDbi_1.22.6
> [13] Biobase_2.20.0
> [14] GenomicRanges_1.12.4
> [15] IRanges_1.18.1
> [16] BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-5       RCurl_1.95-4.1     rtracklayer_1.20.2 stats4_3.0.1
> [5] tcltk_3.0.1        tools_3.0.1        XML_3.98-1.1       zlibbioc_1.6.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list