[BioC] is there an identifier that uniquely identifies a gene all over the many databases ?

Simon Anders anders at ebi.ac.uk
Sun Jul 12 23:14:02 CEST 2009

Hi Maura

mauede at alice.it wrote:
> By trial-and-error it seems the attribute "hgnc_symbol" yields a unique gene identifier ... but I am not quite sure.
> Instead a variable numbers of " refseq_dna" values are listed for the same "hgnc_symbol".

HGNC is the Human Genome Organisation's Gene Nomencalture Committee.
Their gene symbols are in fact unique (that is the whole point of HGNC)
but not every gene has a HGNC symbol yet. See http://www.genenames.org/
for more information.

> In short, given the "ensembl_gene_id" (ENSGxxxxxxxxxxx), is it possible to get the gene identifier for which this is a transcript ?

First of all, ENSGxxxxx IDs are for human genes. Human transcripts get
ENSTxxxx identifiers (with a "T" insetad of a "G"). Each Ensembl gene
can have several Ensembl transcripts, listing all the known splice
variants. Play a bit with the Ensembl web site to see examples.

To get the HGNC symbol for an ensembl gene ID, an easy way is to use
biomaRt. Ask again if you are not familiar with it.


| Dr. Simon Anders, Dipl. Phys.
| European Bioinformatics Institute (EMBL-EBI)
| Hinxton, Cambridgeshire, UK
| office phone +44-1223-492680, mobile phone +44-7505-841692
| preferred (permanent) e-mail: sanders at fs.tum.de

More information about the Bioconductor mailing list