[BioC] Map GO terms to Uniprot from org.Hs.eg

James W. MacDonald jmacdon at med.umich.edu
Wed Sep 14 15:48:23 CEST 2011


Hi Sandeep,

Here's a start.

 > library(org.Hs.eg.db)
 > uniprots <- head(Rkeys(org.Hs.egUNIPROT))
 > uniprots
[1] "A0A183" "A0A5E8" "A0A962" "A0AUX0" "A0AUZ9" "A0AV02"
 > egs <- mget(uniprots, revmap(org.Hs.egUNIPROT))
 > egs
$A0A183
[1] "448835"

$A0A5E8
[1] "10634"

$A0A962
[1] "55072"

$A0AUX0
[1] "272"

$A0AUZ9
[1] "151050"

$A0AV02
[1] "84561"

 > gos <- lapply(egs, get, org.Hs.egGO)

This will result in a list of lists, where the list names are the 
UniProt IDs

 > names(gos)
[1] "A0A183" "A0A5E8" "A0A962" "A0AUX0" "A0AUZ9" "A0AV02"

And for each UniProt ID you have a list of all GO IDs that map to that 
UniProt ID, along with their evidence code.

 > gos$A0A183
$`GO:0031424`
$`GO:0031424`$GOID
[1] "GO:0031424"

$`GO:0031424`$Evidence
[1] "IEA"

$`GO:0031424`$Ontology
[1] "BP"

So for this first one, there is only one GO term, GO:0031424, that is a 
BP term. It can get much more complicated, with multiple terms (of 
multiple types) for each UniProt ID (e.g., you could have 5 MF terms and 
3 BP terms for one UniProt ID). Which may make putting things into a 
nice neat table a bit challenging.

The list can be parsed using some combination of lapply() and sapply(), 
but I don't have the time to play around with it. That will have to be 
your homework for the day.

Also note that you can query these .db packages with SQL queries, if you 
are a database person. This might make things easier. See 
http://www.bioconductor.org/packages/2.8/bioc/vignettes/AnnotationDbi/inst/doc/AnnotationDbi.pdf, 
in particular sections 2.0.9 and 2.0.10.

Best,

Jim



On 9/14/2011 6:53 AM, Sandeep Amberkar wrote:
> Dear All,
>
>
> I have loaded the dataset "org.Hs.eg" into my R-session. Being using it for
> the first time, I am not familiar with its data structure. Can anyone please
> help me in building a table that contains ontology wise mapping to Uniprot
> identifiers? I want the final output table to look something like this --
>
> Uniprot           GO_BP         GO_CC        GO_MF
> ABC123         GO:121         GO:122         GO:123
>
> Thanks in advance for your help.
>
> Warm Regards,
> Sandeep Amberkar
> BioQuant,BQ26,
> Im Neuenheimer Feld 267,
> D-69120,Heidelberg
> Tel: +49-6221-5451354
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list