[BioC] revisiting genomic coordinates to gene
Vincent Carey
stvjc at channing.harvard.edu
Wed Sep 1 09:53:33 CEST 2010
There are many possible approaches and possible pitfalls. Surely the
following is relevant:
> get("CTNNB1", revmap(org.Hs.egSYMBOL))
[1] "1499"
> get("1499", org.Hs.egCHRLOC)
3
41240941
> get("1499", org.Hs.egCHRLOCEND)
3
41281939
Your location lies within these limits. You could do this more
systematically by defining a collection
of Entrez Gene IDs and building an IRanges or GRanges instance that
stores all the "gene boundary"
information for these IDs. You will have to attend to signs and
multiplicities, and to build versions.
The GenomicFeatures makeTranscriptDb* facilities are potentially
useful when one is interested in
transcribed or exonic regions specifically. In the following, tx.3 is
an extract from the result of
makeTranscriptDbFromUCSC("hg18"):
> get("1499", org.Hs.egUCSCKG)
[1] "uc003ckp.2" "uc003ckq.2" "uc003ckr.2" "uc003cks.2" "uc003ckt.1"
[6] "uc010hia.1" "uc011azf.1" "uc011azg.1"
> tx.3[ elementMetadata(tx.3)$tx_name %in% .Last.value, ]
GRanges with 6 ranges and 2 elementMetadata values
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr3 [41211405, 41255849] + | 11545 uc010hia.1
[2] chr3 [41215946, 41256943] + | 11546 uc003ckp.2
[3] chr3 [41215946, 41256943] + | 11547 uc003ckq.2
[4] chr3 [41215946, 41256943] + | 11548 uc003ckr.2
[5] chr3 [41249904, 41253941] + | 11550 uc003cks.2
[6] chr3 [41252167, 41253962] + | 11551 uc003ckt.1
seqlengths
chr1 chr1_random chr10 ... chrX_random chrY
247249719 1663265 135374737 ... 1719168 57772954
and there are undoubtedly ways to use biomaRt to address your concern.
Perhaps the following is also of interest:
> findOverlaps(IRanges(start=41266083,width=1), ranges(tx.3))
An object of class "RangesMatching"
Slot "matchMatrix":
query subject
[1,] 1 2080
[2,] 1 2081
Slot "DIM":
[1] 1 3528
> tx.3[2080:2081,]
GRanges with 2 ranges and 2 elementMetadata values
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr3 [41263094, 41294629] - | 11552 uc003cku.2
[2] chr3 [41263094, 41978664] - | 11553 uc003ckv.2
seqlengths
chr1 chr1_random chr10 ... chrX_random chrY
247249719 1663265 135374737 ... 1719168 57772954
So it seems your location is in a region that is said to be
transcribed. I could
not find an Entrez Gene ID associated with the "known gene" tx_name values
just above.
> sessionInfo()
R version 2.12.0 Under development (unstable) (2010-06-30 r52417)
Platform: x86_64-apple-darwin10.3.0/x86_64 (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices datasets tools utils methods
[8] base
other attached packages:
[1] org.Hs.eg.db_2.4.1 RSQLite_0.9-1 DBI_0.2-5
[4] AnnotationDbi_1.11.1 Biobase_2.9.0 GenomicFeatures_1.1.11
[7] GenomicRanges_1.1.15 IRanges_1.7.32 weaver_1.15.0
[10] codetools_0.2-2 digest_0.4.2
loaded via a namespace (and not attached):
[1] BSgenome_1.17.5 Biostrings_2.17.26 RCurl_1.4-2 XML_3.1-0
[5] biomaRt_2.5.1 rtracklayer_1.9.3
On Tue, Aug 31, 2010 at 11:43 PM, Andrew Yee <yee at post.harvard.edu> wrote:
> I'm interested in converting genomic coordinates to gene names, with
> potential use of the org.Hs.eg.db library, e.g. converting chr3:41,266,083
> to CTNNB1.
>
> I know that this topic has been addressed before, see e.g.:
>
> https://stat.ethz.ch/pipermail/bioconductor/2009-January/025906.html (discusses
> use of overlap in IRanges)
> https://stat.ethz.ch/pipermail/bioconductor/2009-October/030140.html
>
> I was wondering if there have been any new solutions or new packages that
> address this problem since these threads.
>
> Thanks,
> Andrew
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list