[BioC] IPI to entrez id

Carlson, Marc R mcarlson at fhcrc.org
Fri Feb 18 22:17:18 CET 2011


Hi Viritha,

These things can never be 1:1, but you can pretty easily just cram them all into a huge data.frame by doing this:

library(org.Hs.eg.db)
allAnnots <- merge(toTable(org.Hs.egPROSITE), toTable(org.Hs.egGO), by.x="gene_id", by.y="gene_id")
head(allAnnots)

Once you have done this, you may notice that they are not only are these things almost never (if ever) 1:1, but that this could have been even worse if I had used the GO2ALL mappings (and I probably should have, but I can't really tell because I have almost no information about what you want to do).  Anyhow, I hope this helps you, but if you have a more specific use for this information that you are willing to talk about then we might be able to give you a better answer.


  Marc


----- Original Message -----
From: "viritha kaza" <viritha.k at gmail.com>
To: "Manca Marco (PATH)" <m.manca at maastrichtuniversity.nl>
Cc: bioconductor at stat.math.ethz.ch
Sent: Thursday, February 17, 2011 9:46:28 AM
Subject: Re: [BioC] IPI to entrez id

Hi
thanks for the reply:
As samuel suggested I used the following link
ftp://ftp.ebi.ac.uk/pub/databases/IPI/current/ipi.HUMAN.xrefs.gz.
For the once I didnot find,I used the following code

Though I still dont get 1:1 mapping, I got the entrez and the gene
symbol.The ipi_test file contains the list of IPI that I want to convert.

code:
>source('http://bioconductor.org/biocLite.R')
> biocLite("biomaRt")
>library("biomaRt")
 >ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
>ipi=scan("ipi_test.txt",what =character(),sep='\n',quote="")
>getBM(attributes =
c("ipi","entrezgene","hgnc_symbol"),filters="ipi",values=ipi,mart = ensembl)
>write.table(ipi_entrez,"ipi_entrez_test.txt",sep='\t')

I am still not getting a few.Is there any other method or should I think
that those IPI numbers dont have corresponding gene symbols?
Thanks,
Viritha

On Wed, Feb 16, 2011 at 12:23 PM, Manca Marco (PATH) <
m.manca at maastrichtuniversity.nl> wrote:

>
>
> Hi Viritha,
>
> I have found this old answer to a similar question and I think it should
> still apply:
>
> <<
>
> Not sure that it is this easy. The IPI are protein identifiers. GO
> categories classify genes. Neither the mapping from protein to gene or
> gene to GO category is 1:1. GO categories form a hierarchy. So there are
> significant decisions to be made in representing IPI identifiers in a
> pie chart of GO terms.
>
> Bioconductor maintains 'org' and 'GO' database packages that provide the
> necessary link between IPI protein ids and GO gene ontology categories,
> via ENTREZ gene ids. Code might look like
>
>  ## once only, to install packages
>  source('http://bioconductor.org/biocLite.R')
>  biocLite('org.Hs.eg.db', 'GO.db')
>
>  ## from IPI to ENTREZ id, not 1:1
>  library(org.Hs.eg.db)
>  ipi2eg = revmap(eapply(org.Hs.eg.db, names)) ## NOT 1:1 map
>
>  ## Assume ipiIds is, e.g., c('IPI00008860', 'IPI00019922')
>  egIds = revmap(ipi2eg[ipiIds])
>
>  ## get GO terms, also not 1:1
>  goIds = eapply(org.Hs.egGO[names(egIds)], names)
>
> You're still left with the problem of resolving multiple mappings and
> the hierarchical relationship between GO terms.
>
> Martin
>
> >>
>
>
> All the best, Marco
>
> --
> Marco Manca, MD
> University of Maastricht
> Faculty of Health, Medicine and Life Sciences (FHML)
> Cardiovascular Research Institute (CARIM)
>
> Mailing address: PO Box 616, 6200 MD Maastricht (The Netherlands)
> Visiting address: Experimental Vascular Pathology group, Dept of Pathology
> - Room5.08,  Maastricht University Medical Center, P. Debyelaan 25, 6229  HX
> Maastricht
>
> E-mail: m.manca at maastrichtuniversity.nl
> Office telephone: +31(0)433874633
> Personal mobile: +31(0)626441205
> Twitter: @markomanka
>
>
>
> *********************************************************************************************************************
>
> This email and any files transmitted with it are confidential and solely
> for the use of the intended recipient.
>
> It may contain material protected by privacy or attorney-client privilege.
> If you are not the intended recipient or the person responsible for
>
> delivering to the intended recipient, be advised that you have received
> this email in error and that any use is STRICTLY PROHIBITED.
>
> If you have received this email in error please notify us by telephone on
> +31626441205 Dr Marco MANCA
>
>
> *********************************************************************************************************************
> ________________________________________
> Da: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org]
> per conto di viritha kaza [viritha.k at gmail.com]
> Inviato: mercoled� 16 febbraio 2011 18.15
> A: Bioconductor
> Oggetto: [BioC] IPI to entrez id
>
> Hi group,
> I would like to convert a list of 3000 IPI's to genesymbol and entrez id.
> eg : * *  *IPI00658210     *to 57599, WDR48.
> I wanted a 1:1 mapping.
> Could anyone suggest the function and the package which could help me in
> the
> process.
> Thank u in advance,
> Viritha
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

	[[alternative HTML version deleted]]


_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list