[BioC] hgu133plus2 GO issues

James W. MacDonald jmacdon at med.umich.edu
Tue Apr 18 18:20:31 CEST 2006


Hi Jake,

Jake wrote:
> Hi list,
> 
> Could someone please help me understand the differences between the
> (hgu133plus2)GO, GO2PROBE, GO2ALLPROBES?  I've found discepancies that I
> can't quite explain:
> 
>  > mget("GO:0042611", hgu133plus2GO2PROBE)
> Error: value for 'GO:0042611' not found
> 
> 
>>mget("GO:0042611", hgu133plus2GO2ALLPROBES)
> 
> $"GO:0042611"
>           <NA>            IEA            IEA            IEA
> <NA>
>    "209309_at"  "217014_s_at"    "210325_at"  "218831_s_at"
> "1553402_a_at"
>           <NA>           <NA>           <NA>           <NA>
> <NA>
>  "206086_x_at"  "206087_x_at"  "210864_x_at"  "211326_x_at"
> "211327_x_at"
>           <NA>           <NA>           <NA>           <NA>
> <NA>
>  "211328_x_at"  "211329_x_at"  "211330_s_at"  "211331_x_at"
> "211332_x_at"
>           <NA>           <NA>           <NA>            IEA
> <NA>
>  "211863_x_at"  "211866_x_at"  "214647_s_at"    "235754_at"
> "213932_x_at"
>            IEA           <NA>           <NA>            IEA
> <NA>
>  "215313_x_at"  "208729_x_at"  "209140_x_at"  "211911_x_at"
> "208812_x_at"
>           <NA>           <NA>            IEA           <NA>
> <NA>
>  "211799_x_at"  "214459_x_at"  "216526_x_at"    "200904_at"
> "200905_x_at"
>            IEA           <NA>           <NA>            IEA
> <NA>
>  "217456_x_at"  "204806_x_at"  "221875_x_at"    "221978_at"
> "210514_x_at"
>           <NA>           <NA>           <NA>            IEA
> IEA
>  "211528_x_at"  "211529_x_at"  "211530_x_at"  "217436_x_at"
> "231748_at"
>           <NA>            IEA            IEA            IEA
>    "221291_at"    "238542_at"    "221323_at" "1552777_a_at"
> 
> and finally...
> 
> ### "208729_x_at" is one of the probes returned with the above command
> 
>>grep("GO:0042611",unlist(mget("208729_x_at", hgu133plus2GO)))
> 
> numeric(0)
> 
> 
> 
> "208729_x_at" is on the hgu133plus2 chip, but GO and GO2ALLPROBES don't
> map it to the same GO ID.
> 
> Is there something wrong here or am I just missing something?  If
> different, which is the most "reliable" mapping?  I'm concerned because
> I went through to validate GO IDs I had gotten from the GOHyperG
> function (a total of 314), and 117 of those I could not map back to my
> significant probe list using the hgu133plus2GO annotation.  I noticed by
> looking at the GOHyperG function that it uses information from
> GO2ALLPROBES.

Here is the difference:

hgu133plus2GO maps Probe IDs to GO terms
hgu133plus2GO2 PROBE maps GO terms to Probe IDs
hgu133plus2GO2ALLPROBES maps GO terms and all children of the terms to 
Probe IDs

So there isn't really an issue of reliability here, just an issue of 
what you want. In your case, 208729_x_at doesn't map to GO:0042611, but 
it does map to children of that GO term (for instance GO:0042612).

sapply(get("208729_x_at", hgu133plus2GO), function(x) x[[1]])
   GO:0005624   GO:0005887   GO:0016020   GO:0016021   GO:0019882 
GO:0019883
"GO:0005624" "GO:0005887" "GO:0016020" "GO:0016021" "GO:0019882" 
"GO:0019883"
   GO:0019885   GO:0030106   GO:0030106   GO:0042612
"GO:0019885" "GO:0030106" "GO:0030106" "GO:0042612"
 > grep("208729_x_at",get("GO:0042612", hgu133plus2GO2PROBE))
[1] 20
 > grep("208729_x_at",get("GO:0042611", hgu133plus2GO2PROBE))
Error in get(x, envir, mode, inherits) : variable "GO:0042611" was not found
 > grep("208729_x_at",get("GO:0042611", hgu133plus2GO2ALLPROBES))
[1] 20

HTH,

Jim


> 
> Any help/enlightenment is much appreciated.
> 
> PS - using R 2.2.1 with hgu133plus2 1.10.0
> 
> --Jake
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623


**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.



More information about the Bioconductor mailing list