[BioC] org.Mm.eg.db gives wrong symbol for MT genes

Gordon K Smyth smyth at wehi.EDU.AU
Sun Aug 11 04:20:46 CEST 2013


Hi Vincent,

Thanks, that explains it.  After reading your reply, I went to the NCBI 
Gene FAQ and found the following explanation:

"NOTE: To the greatest extent possible, each protein-coding gene in 
mitochondria has been assigned the same name (symbol) and full description 
across species. In some instances, this is at variance with the symbol 
assigned by species-specific nomenclature committees."

This would be fine except that (i) the NCBI Gene web interface disagrees 
with the NCBI gene_info file and (ii) the nomenclature committee symbol 
from MGI has not be included as a synonym in the gene_info file.

Anyway, the bottom line for my lab is that we will treat the 
gene_info/org.Mm.eg.db symbols as official, and we will have to give the 
MT genes special treatment when mapping aliases.

Regards
Gordon

On Sat, 10 Aug 2013, Vincent Carey wrote:

> Gordon, more definitive answers will likely come from the annotation core
> members, but here is what I understand
> about this.  The mappings are completely dependent on NCBI content.
>
> Working with
>
> ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Mus_musculus.gene_info.gz
>
> the header is
>
> #Format: tax_id GeneID Symbol LocusTag Synonyms dbXrefs chromosome
> map_location description type_of_gene Symbol_from_nomenclature_authority
> Full_name_from_nomenclature_authority Nomenclature_status
> Other_designations Modification_date (tab is used as a separator, pound
> sign - start of a comment)
>
> and, with some context, the record for 17710 is
>
>> x[c(1,3516),]
>     tax_id GeneID Symbol LocusTag             Synonyms
> 1     10090  11287    Pzp        - A1m|A2m|AI893533|MAM
> 3516  10090  17710   COX3        -                    -
>                                                          dbXrefs chromosome
> 1    MGI:87854|Ensembl:ENSMUSG00000030359|Vega:OTTMUSG00000022212          6
> 3516                                                   MGI:102502         MT
>           map_location                      description   type_of_gene
> 1    6 F1-G3|6 63.02 cM           pregnancy zone protein protein-coding
> 3516                  - cytochrome c oxidase subunit III protein-coding
>     Symbol_from_nomenclature_authority
> Full_name_from_nomenclature_authority
> 1                                   Pzp                  pregnancy zone
> protein
> 3516                             mt-Co3 cytochrome c oxidase III,
> mitochondrial
>     Nomenclature_status
> Other_designations
> 1                      O alpha 1
> macroglobulin|alpha-2-M|alpha-2-macroglobulin
> 3516                   O
>  -
>     Modification_date  X
> 1             20130804 NA
> 3516          20130804 NA
>
> I would conjecture that the solution needs to come from NCBI -- they may
> have neglected to deal properly with the MT genes in this case, as the
> following computation suggests.  The symbols for which field "Symbol" does
> not agree
> with field "Symbol_from_nomenclature_authority" are
>
>> xsn[xs!=xsn]
>   [1] "mt-Atp6" "mt-Atp8" "mt-Co1"  "mt-Co2"  "mt-Co3"  "mt-Cytb" "mt-Nd1"
>   [8] "mt-Nd2"  "mt-Nd3"  "mt-Nd4"  "mt-Nd4l" "mt-Nd5"  "mt-Nd6"  "mt-Rnr1"
>  [15] "mt-Rnr2" "mt-Ta"   "mt-Tc"   "mt-Td"   "mt-Te"   "mt-Tf"   "mt-Tg"
>  [22] "mt-Th"   "mt-Ti"   "mt-Tk"   "mt-Tl1"  "mt-Tl2"  "mt-Tm"   "mt-Tn"
>  [29] "mt-Tp"   "mt-Tq"   "mt-Tr"   "mt-Ts1"  "mt-Ts2"  "mt-Tt"   "mt-Tv"
>  [36] "mt-Tw"   "mt-Ty"
>
>
> On Fri, Aug 9, 2013 at 11:17 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Dear Biocore,
>>
>> We make a strong effort to use current NCBI official gene symbols and
>> names in all our work, and we make much use of the excellent Bioconductor
>> packages org.Mm.eg.db and org.Hs.eg.db for this purpose.
>>
>> I have recently noticed that org.Mm.eg.db is giving incorrect official
>> names for mitochondrial genes.  It is giving human symbols for these genes
>> instead of mouse symbols.  For example
>>
>>  > mappedRkeys(org.Mm.egSYMBOL["17710"])
>>   [1] "COX3"
>>
>> According to both Entrez Gene
>>
>>   http://www.ncbi.nlm.nih.gov/**gene/?term=17710<http://www.ncbi.nlm.nih.gov/gene/?term=17710>
>>
>> and MGI
>>
>>   http://www.informatics.jax.**org/marker/MGI:102502<http://www.informatics.jax.org/marker/MGI:102502>
>>
>> the official symbol is "mt-Co3".  This has been the official symbol for at
>> least 4 years and probably longer.
>>
>> The correct name is not even included as an Alias:
>>
>>  > mappedRkeys(revmap(org.Mm.egALIAS2EG)["17710"])
>>   [1] "COX3"
>>
>> COX3 is the actually the symbol for the human ortholog.  It should only be
>> an alias for the mouse gene.
>>
>> Same for all the mitochondrial genes.  In all cases, org.Mm.egSYMBOL is
>> giving the human symbol instead of the mouse symbol.
>>
>> Is this deliberate?  If not, can you please fix?
>>
>> Thanks a lot
>> Gordon
>>
>> ---------------------------------------------
>> Professor Gordon K Smyth,
>> Bioinformatics Division,
>> Walter and Eliza Hall Institute of Medical Research,
>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>> http://www.statsci.org/smyth
>>
>>
>>  sessionInfo()
>>>
>> R version 3.0.1 Patched (2013-07-04 r63183)
>> Platform: i386-w64-mingw32/i386 (32-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_Australia.**1252
>> [2] LC_CTYPE=English_Australia.**1252
>> [3] LC_MONETARY=English_Australia.**1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_Australia.1252
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets
>> [7] methods   base
>>
>> other attached packages:
>> [1] org.Mm.eg.db_2.9.0   org.Hs.eg.db_2.9.0   RSQLite_0.11.4
>> [4] DBI_0.2-7            AnnotationDbi_1.22.6 Biobase_2.20.0
>> [7] BiocGenerics_0.6.0   limma_3.17.20
>>
>> loaded via a namespace (and not attached):
>> [1] IRanges_1.18.2 stats4_3.0.1

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list