[BioC] revisiting genomic coordinates to gene

Wed Sep 1 09:53:33 CEST 2010

There are many possible approaches and possible pitfalls.  Surely the
following is relevant:

> get("CTNNB1", revmap(org.Hs.egSYMBOL))
[1] "1499"

> get("1499", org.Hs.egCHRLOC)
       3
41240941
> get("1499", org.Hs.egCHRLOCEND)
       3
41281939

Your location lies within these limits.  You could do this more
systematically by defining a collection
of Entrez Gene IDs and building an IRanges or GRanges instance that
stores all the "gene boundary"
information for these IDs.  You will have to attend to signs and
multiplicities, and to build versions.

The GenomicFeatures makeTranscriptDb* facilities are potentially
useful when one is interested in
transcribed or exonic regions specifically.  In the following, tx.3 is
an extract from the result of
makeTranscriptDbFromUCSC("hg18"):

> get("1499", org.Hs.egUCSCKG)
[1] "uc003ckp.2" "uc003ckq.2" "uc003ckr.2" "uc003cks.2" "uc003ckt.1"
[6] "uc010hia.1" "uc011azf.1" "uc011azg.1"
> tx.3[ elementMetadata(tx.3)$tx_name %in% .Last.value, ]
GRanges with 6 ranges and 2 elementMetadata values
    seqnames               ranges strand |     tx_id     tx_name
       <Rle>            <IRanges>  <Rle> | <integer> <character>
[1]     chr3 [41211405, 41255849]      + |     11545  uc010hia.1
[2]     chr3 [41215946, 41256943]      + |     11546  uc003ckp.2
[3]     chr3 [41215946, 41256943]      + |     11547  uc003ckq.2
[4]     chr3 [41215946, 41256943]      + |     11548  uc003ckr.2
[5]     chr3 [41249904, 41253941]      + |     11550  uc003cks.2
[6]     chr3 [41252167, 41253962]      + |     11551  uc003ckt.1

seqlengths
          chr1   chr1_random         chr10 ...   chrX_random          chrY
     247249719       1663265     135374737 ...       1719168      57772954

and there are undoubtedly ways to use biomaRt to address your concern.

Perhaps the following is also of interest:

> findOverlaps(IRanges(start=41266083,width=1), ranges(tx.3))
An object of class "RangesMatching"
Slot "matchMatrix":
     query subject
[1,]     1    2080
[2,]     1    2081

Slot "DIM":
[1]    1 3528

> tx.3[2080:2081,]
GRanges with 2 ranges and 2 elementMetadata values
    seqnames               ranges strand |     tx_id     tx_name
       <Rle>            <IRanges>  <Rle> | <integer> <character>
[1]     chr3 [41263094, 41294629]      - |     11552  uc003cku.2
[2]     chr3 [41263094, 41978664]      - |     11553  uc003ckv.2

seqlengths
          chr1   chr1_random         chr10 ...   chrX_random          chrY
     247249719       1663265     135374737 ...       1719168      57772954

So it seems your location is in a region that is said to be
transcribed.  I could
not find an Entrez Gene ID associated with the "known gene" tx_name values
just above.

> sessionInfo()
R version 2.12.0 Under development (unstable) (2010-06-30 r52417)
Platform: x86_64-apple-darwin10.3.0/x86_64 (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices datasets  tools     utils     methods
[8] base

other attached packages:
 [1] org.Hs.eg.db_2.4.1     RSQLite_0.9-1          DBI_0.2-5
 [4] AnnotationDbi_1.11.1   Biobase_2.9.0          GenomicFeatures_1.1.11
 [7] GenomicRanges_1.1.15   IRanges_1.7.32         weaver_1.15.0
[10] codetools_0.2-2        digest_0.4.2

loaded via a namespace (and not attached):
[1] BSgenome_1.17.5    Biostrings_2.17.26 RCurl_1.4-2        XML_3.1-0
[5] biomaRt_2.5.1      rtracklayer_1.9.3

On Tue, Aug 31, 2010 at 11:43 PM, Andrew Yee <yee at post.harvard.edu> wrote:
> I'm interested in converting genomic coordinates to gene names, with
> potential use of the org.Hs.eg.db library, e.g. converting chr3:41,266,083
> to CTNNB1.
>
> I know that this topic has been addressed before, see e.g.:
>
> https://stat.ethz.ch/pipermail/bioconductor/2009-January/025906.html (discusses
> use of overlap in IRanges)
> https://stat.ethz.ch/pipermail/bioconductor/2009-October/030140.html
>
> I was wondering if there have been any new solutions or new packages that
> address this problem since these threads.
>
> Thanks,
> Andrew
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>