[BioC] Odd isolated problem with GOstats

Mark W Kimpel mwkimpel at gmail.com
Sat Jan 5 21:05:47 CET 2008


Marc,

I have attempted to modify my code without success. The reason I was 
requesting a patch is because I believe the problem is originating 
within the summary function. Below is my code that generatesa 
GOHyperGParams object, applies hyperGTest, and then tries to summarize 
the result.

I took a look at the test.object created by hyperGTest to see if it 
contained GO:ID's that I could remove or subset out as you suggest but 
it does not contain any. These seem to be fetched my mget called from 
within summary. I looked at the source code for summary and tried to 
modify it by having mget return NA ifnotfound (see further below) but 
this results in a problem with lapply when "standardGeneric for "Term" 
defined from package "AnnotationDbi"" is the FUN applied to NA. So I am 
stuck and would appreciate your help.

Please see my comment in the R code for where I think a fix should be 
applied. I understand that this may only occur occasionally and only in 
BioC-devel, but it would seem useful to those of us who do test BioC-devel.

Thanks, Mark

#############################################################
#MY CODE
params <- new("GOHyperGParams", geneIds = selectedEntrezIds,
     universeGeneIds = entrez.universe.vec, annotation = annotationPckg,
     ontology = "BP", pvalueCutoff = minP, conditional = TRUE,
     testDirection = testDirection)

test.obj<-hyperGTest(params)

#this is where the blow-up occurs
output.table<-data.frame(summary(test.obj))

######################################################################
# GOstats code from hyperGtable.R
##################################################################
setMethod("summary", signature(object="GOHyperGResult"),
           function(object, pvalue=pvalueCutoff(object),
                    categorySize=NULL, htmlLinks=FALSE) {
               AMIGO_URL <- 
"http://www.godatabase.org/cgi-bin/amigo/go.cgi?view=details&search_constraint=terms&depth=0&query=%s"
               df <- callNextMethod(object=object, pvalue=pvalue,
                                    categorySize=categorySize)
               if (nrow(df) == 0)  {
                   df$Term <- character(0)
                   return(df)
               }


#This is where it would be nice to filter out invalid goIds but I can't 
#  figure out how to do it
               goIds <- df[[1]]

               goTerms <- sapply(mget(goIds, GOenv("TERM"), ifnotfound = 
NA), Term)
               if (htmlLinks) {
                   goIdUrls <- sapply(goIds,
                                      function(x) sprintf(AMIGO_URL, x))
                   goTerms <- paste('<a href="', goIdUrls, '">', goTerms,
                                    '</a>', sep="")
               }
               df$Term <- goTerms
               df
           })




Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail
(317) 204-4202 Home (no voice mail please)

mwkimpel<at>gmail<dot>com

******************************************************************


Marc Carlson wrote:
> Marc Carlson wrote:
>> Mark W Kimpel wrote:
>>   
>>> I have used a custom wrapper function to invoke GOstats for almost a 
>>> year without problem. Now, with just one dataset, I am getting an error. 
>>> Not sure if this is a bug or something very odd about just my one 
>>> dataset. The GO:ID that yields the error is a valid ID, I checked. I am 
>>> using the devel version and rechecked after just updating my packages.
>>>
>>> Any ideas as to what the problem might be?
>>>
>>> Loading required package: rat2302
>>> Error in .checkKeys(value, Lkeys(x), x at ifnotfound) :
>>>    invalid key "GO:0016089"
>>>
>>> Enter a frame number, or 0 to exit
>>>
>>>   1: limma.contrast.output.func(AOP, fit2, rslt, contrast.num = 1, 
>>> custom.contra
>>>   2: GO.anal.func(input.df = t.tab.annot.obj$sig.named.genes, 
>>> annotationPckg = a
>>>   3: GOHyperMaxCats.func(optimizedParam)
>>>   4: data.frame(summary(test.obj))
>>>   5: summary(test.obj)
>>>   6: summary(test.obj)
>>>   7: .local(object, ...)
>>>   8: sapply(mget(goIds, GOenv("TERM")), Term)
>>>   9: lapply(X, FUN, ...)
>>> 10: is.vector(X)
>>> 11: mget(goIds, GOenv("TERM"))
>>> 12: mget(goIds, GOenv("TERM"))
>>> 13: `keys<-`(`*tmp*`, value = c("GO:0007268", "GO:0007214", 
>>> "GO:0050877", "GO:0
>>> 14: `keys<-`(`*tmp*`, value = c("GO:0007268", "GO:0007214", 
>>> "GO:0050877", "GO:0
>>> 15: switch(as.character(direction(x)), "1" = `Lkeys<-`(x, value), "-1" = 
>>> `Rkeys
>>> 16: `Lkeys<-`(x, value)
>>> 17: `Lkeys<-`(x, value)
>>> 18: .checkKeys(value, Lkeys(x), x at ifnotfound)
>>>
>>>
>>>  > sessionInfo()
>>> R version 2.7.0 Under development (unstable) (2007-12-20 r43739)
>>> x86_64-unknown-linux-gnu
>>>
>>> locale:
>>> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] splines   tools     stats     graphics  grDevices utils     datasets
>>> [8] methods   base
>>>
>>> other attached packages:
>>>   [1] rat2302_2.0.1        affycoretools_1.11.2 annaffy_1.11.1
>>>   [4] KEGG_2.0.1           GO_2.0.1             gcrma_2.11.1
>>>   [7] matchprobes_1.11.0   biomaRt_1.13.6       RCurl_0.8-3
>>> [10] GOstats_2.5.0        Category_2.5.0       genefilter_1.17.7
>>> [13] survival_2.34        RBGL_1.15.7          annotate_1.17.3
>>> [16] xtable_1.5-2         GO.db_2.1.1          AnnotationDbi_1.1.8
>>> [19] RSQLite_0.6-4        DBI_0.2-4            graph_1.17.14
>>> [22] limma_2.13.3         affy_1.17.3          preprocessCore_1.1.5
>>> [25] affyio_1.7.9         Biobase_1.17.8
>>>
>>> loaded via a namespace (and not attached):
>>> [1] cluster_1.11.9 XML_1.93-2
>>>   
>>>     
> 
> If you could show me what it is that your script is doing (some source
> code) then I might be better able to help you with this.  But as it
> stands right now I have very few specifics about what you are trying to
> do.  I can see you posted the sessionInfo() which helps me know what
> versions of things you are using.  But what is this script that you
> speak of?  What does that code  actually look like?  I am afraid that I
> don't have a patch for you because the annotation package is not
> actually broken, and is only missing information because of what was
> available at GO at the precise moment when that package was made.  I
> can't tell you why GO decided to remove that particular term and then
> later on decided to add it back in, but  I looked at data that we
> gathered from them to build annotations at different time points and
> that is what has actually happened in this case.
> 
>>From your previous posts, it looks like your trouble is caused because
> you are missing a key and then looking for that missing key in an
> environment.  If that is what is happening, then I think you should code
> things in a way so that you can filter out keys that are not present.
> 
> More specifically I think that you probably just want to do some minor
> cleanup before you call something like mget() to make sure that your
> keys are actually present.  Here is a quick example of how this might look:
> 
> library(GO.db)
> 
> myKeys = c("FAKEID","GO:0000006","GO:0000003")
> 
> myKeys[myKeys %in% keys(GOTERM)]
> 
> 
> Hope this helps,
> 
> 
>      Marc
> 
> 
>



More information about the Bioconductor mailing list