[BioC] hs.Mm.inp.db problem

Marc Carlson mcarlson at fhcrc.org
Fri Nov 13 02:33:53 CET 2009


Hi Di,

I can't speak for the origins of the biomaRt homolog information so I
can only answer half of your question. 

The inparanoid packages use data directly from inparanoid.  All of the
relevant data from inparanoid is included in database for these
packages.  But only the data that is predicted by this algorithm as
scoring 100% is used in the actual mapping.  For popular organisms like
mouse, human and flies, we have made sure to include enough other data
in the relevant organism packages so that you can patch through the
appropriate inparanoid packages to retrieve homologs. 

Also, inparanoid recently updated their datasources and this was a
pretty major revision (meaning it unintentionally breaks some things for
us in terms of updating our inparanoid data).  So if these packages are
starting to finally get some use please let me know so that I can
prioritize getting our sources updated to their newer version accordingly.


  Marc



Di Wu wrote:
> Following up this question, I am trying to get human homolog genes for some
> genes in mouse in Illumin mouse v2 array platform. What is the difference in
> results between using getLDS in biomaRt and the hom.Mm.inp.db package? Do
> both methods use similar source information?
>
> Thanks in advance.
> Di
>
> On Fri, Nov 13, 2009 at 8:28 AM, Iain Gallagher <
> iaingallagher at btopenworld.com> wrote:
>
>   
>> Thanks Mark
>>
>> Works a treat.
>>
>> Iain
>>
>> --- On Thu, 12/11/09, Marc Carlson <mcarlson at fhcrc.org> wrote:
>>
>>     
>>> From: Marc Carlson <mcarlson at fhcrc.org>
>>> Subject: Re: [BioC] hs.Mm.inp.db problem
>>> To: "Iain Gallagher" <iaingallagher at btopenworld.com>
>>> Cc: bioconductor at stat.math.ethz.ch
>>> Date: Thursday, 12 November, 2009, 20:29
>>> Hi Iain,
>>>
>>> The trouble you are having is because inparanoid uses
>>> Jackson lab IDs
>>> (MGI) instead of ensembl protein IDs when representing
>>> mouse.
>>>
>>> So this script should work better:
>>>
>>> library(hom.Mm.inp.db)
>>> library(org.Mm.eg.db)
>>> library(org.Hs.eg.db)
>>>
>>> dataIn <- c('Ints7', 'Upp1', 'Cdc2a')
>>> egs <- mget(dataIn,revmap(org.Mm.egSYMBOL))
>>>
>>> ## this is what you want right here:
>>> mouseProtIds <- mget(unlist(egs),org.Mm.egMGI)
>>> mouseProtIds <- mouseProtIds[!is.na(mouseProtIds)]
>>>
>>> rawHumanProtIds <-
>>> mget(unlist(mouseProtIds),hom.Mm.inpHOMSA,ifnotfound=NA)
>>>
>>> ##etc.
>>>
>>> Hope this helps,
>>>
>>>
>>>   Marc
>>>
>>>
>>>
>>> Iain Gallagher wrote:
>>>       
>>>> Hi - Just a follow up post.
>>>>
>>>> The title should of course be hom.Mm.inp.db problem
>>>>         
>>> and session info is below:
>>>       
>>>>         
>>>>> sessionInfo()
>>>>>
>>>>>           
>>>> R version 2.9.0 (2009-04-17)
>>>> x86_64-pc-linux-gnu
>>>>
>>>> locale:
>>>>
>>>>         
>> LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
>>     
>>>> attached base packages:
>>>> [1] stats     graphics
>>>>         
>>> grDevices utils     datasets
>>> methods   base
>>>       
>>>> other attached packages:
>>>> [1] org.Hs.eg.db_2.2.11
>>>>         
>>> org.Mm.eg.db_2.2.11  hom.Mm.inp.db_2.2.11
>>>       
>>>> [4] RSQLite_0.7-1
>>>>         
>>> DBI_0.2-4
>>> AnnotationDbi_1.6.0
>>>       
>>>> [7] Biobase_2.4.1
>>>>
>>>> Thanks
>>>>
>>>> Iain
>>>>
>>>> --- On Thu, 12/11/09, Iain Gallagher <iaingallagher at btopenworld.com>
>>>>         
>>> wrote:
>>>       
>>>>         
>>>>> From: Iain Gallagher <iaingallagher at btopenworld.com>
>>>>> Subject: [BioC] hs.Mm.inp.db problem
>>>>> To: bioconductor at stat.math.ethz.ch
>>>>> Date: Thursday, 12 November, 2009, 18:41
>>>>> Hello List
>>>>>
>>>>> I am trying to map ~5000 mouse genes to human
>>>>>           
>>> genes using
>>>       
>>>>> the inparanoid package and I am failing
>>>>>           
>>> miserably!
>>>       
>>>>> Having followed the example in the documentation I
>>>>>           
>>> can't
>>>       
>>>>> get any of my 5000 mouse genes converted to human
>>>>>           
>>> EG ids.
>>>       
>>>>> Example follows with 3 genes only:
>>>>>
>>>>> rm(list=ls())
>>>>>
>>>>> library(hom.Mm.inp.db)
>>>>> library(org.Mm.eg.db)
>>>>> library(org.Hs.eg.db)
>>>>>
>>>>> #mouse genes in as symbols
>>>>> dataIn <- c('Ints7', 'Upp1', 'Cdc2a')
>>>>>
>>>>> #map these to mouse EG ids
>>>>> egIds <- revmap(org.Mm.egSYMBOL)
>>>>> mapped <- mappedkeys(egIds)
>>>>> egIds <- as.list(egIds[mapped])
>>>>> ind <- which(names(egIds)%in%dataIn)
>>>>> egIdsIn <- egIds[ind]
>>>>> #map these IDs to ENSEMBL protein Ids as used for
>>>>>           
>>> the
>>>       
>>>>> inparanoid mapping
>>>>> mouseProtIds <-
>>>>> mget(unlist(egIdsIn),org.Mm.egENSEMBLPROT)
>>>>> mouseProtIds <-
>>>>>           
>>> mouseProtIds[!is.na(mouseProtIds)]
>>>       
>>>>> #this is the point of failure!
>>>>> rawHumanProtIds <-
>>>>>
>>>>>           
>>> mget(unlist(mouseProtIds),hom.Mm.inpHOMSA,ifnotfound=NA)
>>>       
>>>>> the returned list is full of NA
>>>>>
>>>>> Using biomart on the Ensembl site I can get:
>>>>>
>>>>> Ensembl Transcript ID    Human Ensembl
>>>>>           
>>> Protein
>>>       
>>>>> ID
>>>>> ENSMUST00000020099
>>>>>    ENSP00000397973
>>>>>
>>>>> For example, for Cdc2a, so I know there are
>>>>>           
>>> homologs there,
>>>       
>>>>> but for some reason the inparanoid package is not
>>>>>           
>>> working
>>>       
>>>>> for me.
>>>>> Using the example in the documentation it does
>>>>>           
>>> work though
>>>       
>>>>> so I'm assuming the mistake is with me.
>>>>>
>>>>> Can anyone help with this (more curiosity now - I
>>>>>           
>>> can get
>>>       
>>>>> the data through biomart)?
>>>>>
>>>>> Cheers
>>>>>
>>>>> Iain
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>>           
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>     
>>>>>           
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>>         
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>     
>>>>         
>>>       
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>     
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list