[BioC] Bioconductor Digest, Vol 83, Issue 4

Lavinia Gordon lavinia.gordon at mcri.edu.au
Thu Jan 7 02:49:39 CET 2010


   Hi Waverley
   In your input you also have Ensembl protein ids (e.g. ENSP00000231338).  You
   could extract these, use them as your input list, use biomaRt to match these
   up with molecular function GO ids, calculate the frequency of the ids, e.g
   molecular_function%binding%protein binding              10
   molecular_function%molecular transducer activity        6
   then:
   pie.mf <- c(10,6,...)
   names(pie.mf)  <-  c("binding%protein  binding", "molecular transducer
   activity", ...)
   pie(pie.mf)
   You could also use your SWISS-PROT or TREMBL ids (again, via biomaRt).  Note
   that genes can often have multiple GO terms associated to them.
   Have   a   look   at  some  of  the  other  Bioconductor  GO  packages
   ([1]http://bioconductor.org/packages/2.5/GO.html                   and
   http://www.bioconductor.org/packages/release/bioc/html/topGO.html)which
   suggest some other ways of visualizing GOs.
   regards
   Lavinia Gordon.

     Waverley @ Palo Alto wrote:
     > Hi,
     >
     > I have a list of IPI gene IDs. ?I want to find out whether there is a
     > package which can map the gene ontology to these IPIs, and plot the
     > pie chart to demonstrate the molecular function distributions.
     >
     > The input is like the following gene IPI IDs:
     >
     IPI:IPI00008860.1|SWISS-PROT:Q9BXJ4-1|TREMBL:Q542Y2|ENSEMBL:ENSP0000023133
     8;EN
     >
     IPI:IPI00019922.5|SWISS-PROT:Q8N0Y2-1|TREMBL:Q53F81|ENSEMBL:ENSP0000033886
     0;ENSP00000375594|REFSEQ:NP_060807|H-INV:HIT000028861|VEGA:OTTHUMP00000078
     377
     > Tax_Id=9606 Gene_Symbol=ZN
     >
     IPI:IPI00647423.2|SWISS-PROT:Q8N819-1|REFSEQ:NP_001073870|VEGA:OTTHUMP0000
     0076687
     > Tax_Id=9606 Gene_Symbol=FLJ40125 Isoform 1 of
     >
     IPI:IPI00219000.2|SWISS-PROT:P27658|TREMBL:Q53XI6|ENSEMBL:ENSP00000261037|
     REFS
     >
     IPI:IPI00291878.4|SWISS-PROT:P35247|ENSEMBL:ENSP00000361366|REFSEQ:NP_0030
     10|H-INV:HIT000039466|VEGA:OTTHUMP00000019944
     >
     IPI:IPI00013945.1|SWISS-PROT:P07911-1|TREMBL:Q8NHW8|ENSEMBL:ENSP0000030627
     9|RE
     >
     IPI:IPI00000634.1|SWISS-PROT:Q16204|TREMBL:Q6GSG7|ENSEMBL:ENSP00000263102|
     REFS
     >
     > I want to plot the pie chart of these gene distribution in the GO
     > molecular function as a pie chart. ?An example is shown in the
     > following link
     [2]http://www.proteomesci.com/content/7/1/6/figure/F2?highres=y
     >
     >
     > Can some one help?
     Not sure that it is this easy. The IPI are protein identifiers. GO
     categories classify genes. Neither the mapping from protein to gene or
     gene to GO category is 1:1. GO categories form a hierarchy. So there are
     significant decisions to be made in representing IPI identifiers in a
     pie chart of GO terms.
     Bioconductor maintains 'org' and 'GO' database packages that provide the
     necessary link between IPI protein ids and GO gene ontology categories,
     via ENTREZ gene ids. Code might look like
     ?## once only, to install packages
     ?source('http://bioconductor.org/biocLite.R')
     ?biocLite('org.Hs.eg.db', 'GO.db')
     ?## from IPI to ENTREZ id, not 1:1
     ?library(org.Hs.eg.db)
     ?ipi2eg = revmap(eapply(org.Hs.eg.db, names)) ## NOT 1:1 map
     ?## Assume ipiIds is, e.g., c('IPI00008860', 'IPI00019922')
     ?egIds = revmap(ipi2eg[ipiIds])
     ?## get GO terms, also not 1:1
     ?goIds = eapply(org.Hs.egGO[names(egIds)], names)
     You're still left with the problem of resolving multiple mappings and
     the hierarchical relationship between GO terms. Asking on the
     Bioconductor mailing list
     ?[3]http://bioconductor.org/docs/mailList.html
     is likely to lead to helpful answers.
     Martin

   Lavinia Gordon
   Research Officer
   Bioinformatics
   Murdoch Childrens Research Institute
   Royal Children's Hospital
   Flemington Road Parkville Victoria 3052 Australia
   telephone: +61 3 8341 6221
   [4]www.mcri.edu.au
   This e-mail and any attachments to it (the "Communication") are, unless
   otherwise stated, confidential, may contain copyright material and is for
   the use only of the intended recipient. If you receive the Communication in
   error, please notify the sender immediately by return e-mail, delete the
   Communication and the return e-mail, and do not read, copy, retransmit or
   otherwise deal with it. Any views expressed in the Communication are those
   of  the individual sender only, unless expressly stated to be those of
   Murdoch Childrens Research Institute (MCRI) ABN 21 006 566 972 or any of its
   related entities. MCRI does not accept liability in connection with the
   integrity  of  or  errors  in  the Communication, computer virus, data
   corruption,  interference  or  delay arising from or in respect of the
   Communication.
   Please consider the environment before printing this email

References

   1. http://bioconductor.org/packages/2.5/GO.html%20and%20http://www.bioconductor.org/packages/release/bioc/html/topGO.html
   2. http://www.proteomesci.com/content/7/1/6/figure/F2?highres=y
   3. http://bioconductor.org/docs/mailList.html
   4. http://www.mcri.edu.au/


More information about the Bioconductor mailing list