[BioC] annotation in Ensembl using biomart

Wolfgang Huber whuber at embl.de
Thu Mar 11 20:01:19 CET 2010


Dear Jason

> Thanks for the reply.
> 
> You are right that CDC2L1 is the previous name for CDK11B, and CDC2L2
> for CDK11A. I guess I was confused by the output given by BioMart, in
> which the match between old names and new names totally are random
> (see the previous post). Could be errors in BioMart (table-join)?

No, not an error in BioMart (nor biomaRt) - this is what the database 
says. Please read my previous message.

	Wolfgang

> Thanks again,
> Jason
> 
> 
> On Thu, Mar 11, 2010 at 12:42 PM, Wolfgang Huber <whuber at embl.de> wrote:
>> Dear Jason
>>
>> a quick look at the HGNC website (http://www.genenames.org) will tell you
>> that CDC2L1 is the previous name for CDK11B (the currently approved gene
>> symbol) and similarly CDC2L2 for CDK11A and furthermore that Ensembl as well
>> as the UCSC genome browser in the meanwhile map them to the same place in
>> the reference genome and consider them isoforms of the same gene:
>> http://www.genenames.org/data/hgnc_data.php?hgnc_id=1729
>> http://www.genenames.org/data/hgnc_data.php?hgnc_id=1730
>>
>> OTOH, Entrez and UniProt consider them as separate genes ("Duplicated gene.
>> CDK11A and CDK11B encode almost identical protein kinases of 110 kDa that
>> ..."): http://www.uniprot.org/uniprot/Q9UQ88
>>
>> Biology, and the history of biological discovery, can be messy...
>> Other people might have more insight, but I bet it is a long story :)
>>
>>        Wolfgang
>>
>>
>> Jason Lu scripsit 11/03/10 17:06:
>>> Hi all,
>>>
>>> I wonder whether I could help from this list. Sorry if this is a
>>> duplicate question.
>>>
>>> I get confused with the following mapping (by using the BioMart
>>> website). They share the same ENSG. My purpose is to match ENSG to a
>>> gene symbol.
>>> Do you have any suggestion which one I should use?
>>> Thanks,
>>>
>>>
>>> Ensembl Gene ID Ensembl Transcript ID HGNC symbol HGNC curated gene name
>>> ENSG00000008128 ENST00000401097 CDK11B CDC2L2
>>> ENSG00000008128 ENST00000401097 CDK11A CDC2L2
>>> ENSG00000008128 ENST00000401097 CDK11B CDC2L1
>>> ENSG00000008128 ENST00000401097 CDK11A CDC2L1
>>> ENSG00000008128 ENST00000341832 CDK11B CDC2L2
>>> ENSG00000008128 ENST00000341832 CDK11A CDC2L2
>>> ENSG00000008128 ENST00000341832 CDK11B CDC2L1
>>> ENSG00000008128 ENST00000341832 CDK11A CDC2L1
>>> ENSG00000008128 ENST00000407249 CDK11B CDC2L2
>>> ENSG00000008128 ENST00000407249 CDK11A CDC2L2
>>>
>>> Jason
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> --
>>
>> Best wishes
>>     Wolfgang
>>
>>
>> --
>> Wolfgang Huber
>> EMBL
>> http://www.embl.de/research/units/genome_biology/huber/contact
>>
>>
>>


-- 

Best wishes
      Wolfgang


--
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber/contact



More information about the Bioconductor mailing list