[BioC] GOstat question

nicolas servant Nicolas.Servant at curie.fr
Fri Mar 21 14:51:08 CET 2008


 > sessionInfo()
R version 2.5.0 (2007-04-23)
i386-pc-solaris2.10

locale:
C

attached base packages:
[1] "splines"   "tools"     "stats"     "graphics"  "grDevices" "utils"   
[7] "datasets"  "methods"   "base"    

other attached packages:
    GOstats    Category      Matrix     lattice  genefilter    survival
    "2.2.6"     "2.2.3" "0.99875-2"    "0.15-5"    "1.14.1"      "2.31"
       KEGG        RBGL    annotate     Biobase          GO       graph
   "1.16.1"    "1.12.0"    "1.14.1"    "1.14.0"    "1.16.0"    "1.14.2"
hgu133plus2
   "1.16.0"


Here, a *reproductible* example :

require(hgu133plus2)

##UNIVERSE
designALL<-names(unlist(as.list(hgu133plus2ACCNUM)))
##GO
haveGo <- sapply(mget(designALL, hgu133plus2GO),
       function(x) {
       if (length(x) == 1 && is.na(x)){
      FALSE
      }
else{
    TRUE
}
})
numNoGO <- sum(!haveGo)
print(paste(numNoGO, "pbsets in Univers have no GO ids"))
designALL<-designALL[haveGo]

entrezUniverse <- 
na.omit(unique(unlist(mget(designALL,hgu133plus2ENTREZID))))
if (any(duplicated(entrezUniverse))){
   stop("error in gene universe: can't have duplicate Ent")
}

##ALIST
selectedEntrezIds <- entrezUniverse[1:150]

require(GOstats)

params.BP.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="BP",pvalueCutoff=0.05,conditional=TRUE,testDirection="over")
hgOver.BP<-hyperGTest(params.BP.over)

length(selectedEntrezIds)
length(hgOver.BP at geneIds)

As you can see, i performed a non-specific filtering by removing the 
probesets with no GO annotations (as presented in the vignette !).
I'm not interested here by using an other filtering approach like 
Student test, IQR filtering, etc ...

 > length(selectedEntrezIds)
[1] 150
 > length(hgOver.BP at geneIds)
[1] 124
 
To summarize, all the entrezID of my list have a GO annotation, but the 
hyperGTest function removes 26 additional genes.
I guess there are some explanations ?

Thanks for your help,
Nicolas



James W. MacDonald a écrit :
> Hi Nicolas,
>
> nicolas servant wrote:
>> Sorry. I use the BioC GOStats package (2.2.6).
>
> Yes, but you *still* refuse to follow the posting guide, so we are 
> left to guess at what you have done. Here's a hint; give us (always) 
> the results of calling sessionInfo(), and give us a reproducible 
> example. The example you give below is not reproducible, as you are 
> the only one who has these data. If we can't do what you have done to 
> see what happened, how exactly are we to figure out why you get these 
> results?
>
> And by reproducible example I don't mean for you to give us a link to 
> your data. You can simply use set.seed(), randomly select some affy 
> IDs, and then go on from there as if these were the IDs you got from 
> an analysis.
>
>> In connection with my last mail, I have a other question :
>>
>> ##universe
>> designALL<-names(unlist(as.list(hgu133plus2ACCNUM)))
>> entrezUniverse <- 
>> na.omit(unique(unlist(mget(designALL,hgu133plus2ENTREZID))))
>> ##myList
>> selectedEntrezIds <- 
>> na.omit(unique(unlist(mget(mylist,hgu133plus2ENTREZID))))
>> params.BP.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="BP",pvalueCutoff=0.05,conditional=TRUE,testDirection="over") 
>>
>> params.MF.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="MF",pvalueCutoff=0.05,conditional=TRUE,testDirection="over") 
>>
>> params.CC.over<-new("GOHyperGParams",geneIds=selectedEntrezIds,universeGeneIds=entrezUniverse,annotation="hgu133plus2",ontology="CC",pvalueCutoff=0.05,conditional=TRUE,testDirection="over") 
>>
>>
>> hgOver.BP<-hyperGTest(params.BP.over)
>> hgOver.MF<-hyperGTest(params.MF.over)
>> hgOver.CC<-hyperGTest(params.CC.over)
>>
>> length(selectedEntrezIds)
>> [1] 3761
>> length(hgOver.BP at geneIds)
>> [1] 3122
>>
>> I guess the hyperGTest has removed these 639 missing genes. But i 
>> don't understand why ?
>
> If you were to peruse the vignettes for the GOstats package, you will 
> see that there are two filtering steps that you should take before 
> running the analysis. The first is to subset to unique Entrez Gene IDs 
> as you have done. The second is to remove those Entrez Gene IDs that 
> have no GO annotations associated with them. My best guess is the 639 
> missing genes have no GO annotations. But I am guessing since I can't 
> know.
>
> Best,
>
> Jim
>
>
>>
>> Nicolas
>>
>> Robert Gentleman a écrit :
>>> Hi Nicolas,
>>>
>>>    Perhaps you could peruse the posting guide and provide the 
>>> information it asks you to.  Only from that could one hope to give 
>>> you a reasonable answer (ie, what commands did you run, what what 
>>> the output, and your sessionInfo, at a bare minimum).  And the 
>>> Bioconductor package is GOstats, not GOstat - that is something else 
>>> (not Bioconductor) and if you want help with it you should ask those 
>>> folks directly.
>>>
>>>   Robert
>>>
>>>
>>> nicolas servant wrote:
>>>  
>>>> Hi,
>>>>
>>>> I have a question about the GOstat package.
>>>> I'm working with a list of probesets, and i'm interested by testing 
>>>> the over representation of the GO terms in my list.
>>>> So, I changed the probesets IDs (of my lists and my Universe) into 
>>>> ENTREZ IDs and the hyperGTest performed well.
>>>> For instance i have some result as :
>>>>
>>>> GO:0005635
>>>> Pvalue = 0.04
>>>> OddsRatio = 0.04
>>>> ExpCount =  11
>>>> Count = 17
>>>> Size = 45
>>>>
>>>> But, when i did the opposite :
>>>>
>>>> test<-mget("GO:0005635",hgu133plus2GO2ALLPROBES)
>>>> entrez <- 
>>>> unique(unlist(mget(as.vector(unlist(test)),hgu133plus2ENTREZID)))
>>>> length(entrez)
>>>> [1] 126
>>>>
>>>> I don't understand why I find 126 entrez IDs in the Universe, and 
>>>> no 45 as expected (SIZE) ...
>>>>
>>>> mylistentrez<-unique(intersect(entrez,mylist))
>>>> length(mylistentrez)
>>>> [1] 50
>>>>
>>>> In the same way, I find 50 entrez IDs in my list, and no 17 as 
>>>> expected (COUNT)
>>>>
>>>> Thanks for your explanations,
>>>> Bests,
>>>>
>>>> Nicolas
>>>>
>>>>     
>>>   
>>
>>
>


-- 
Nicolas Servant
Equipe Bioinformatique
Institut Curie
26, rue d'Ulm - 75248 Paris Cedex 05 - FRANCE

Email: Nicolas.Servant at curie.fr
Tel: 01 53 10 70 55
http://bioinfo.curie.fr/



More information about the Bioconductor mailing list