[BioC] How to annotate genomic coordinates

James W. MacDonald jmacdon at uw.edu
Thu Nov 8 15:46:20 CET 2012


Hi Jose,

On 11/8/2012 8:19 AM, José Luis Lavín wrote:
> Dear Bioconductor list,
>
> I write you this email asking for a Bioconductor module that allows me to
> annotate genomic coordinates and get different GeneIds.
> I'll show you an example of what I'm referring to:
>
> I have this data:
> Chromosome  coordinate
> chr17              31246506

It depends on what that coordinate is. Is it the start of a transcript? 
A SNP? Do you really just have a single coordinate, or do you have a 
range? What species are we talking about here?

Depending on what your data are, you might want to use either one of the 
TxDb packages, or a SNPlocs package. There really isn't much to go on 
here. If I assume this is a coordinate that one might think is within an 
exon, and if I further assume you are working with H. sapiens, I could 
suggest something like

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
ex <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, "gene")

x <- GRanges(seq = "chr17", IRanges(start = 31245606, width = 1))

ex[ex %in% x]

or maybe more appropriately

names(ex)[ex %in% x]

which will give you the Gene ID, and you can go from there using the 
org.Hs.eg.db package.

If however, your coordinate isn't in an exon, but might be in a UTR, you 
can look at ?exonsBy to see what other sequences you can extract to 
compare with.

If these are SNPs, then you can look at the help pages for the relevant 
SNPlocs package.

Best,

Jim


>
> which can also be written this way by the program that yielded the result:
> chr17.31246506
>
> And I need to convert this data into a gene name and known gene Ids, such
> as:
>
> Gene name  Entrez_ID  Ensembl_ID
>
> Tff3 NM_011575 20050
> Can you please advice me about a module able to perform this ID conversion
> using a list of  "chr17.31246506" type coordinates as input?
>
> Thanks in advance
>
> With best wishes
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list