[BioC] Extract microarray data for genes identified by GO analysis

Wed Feb 20 06:27:15 CET 2013

Jim,

Thank you for your patience. I realize there is only so much you can do without a reproducible example, but here's one bit of information that makes me a bit suspicious. If I just type "mfOver1" I get a summary of the object as follows stating that there are 105 categories with a p-value < 0.05:

> mfOver1
Gene to GO MF  test for over-representation 
389 GO MF ids tested (105 have p < 0.05)
Selected gene set size: 133 
    Gene universe size: 18105 
    Annotation package: lumiMouseAll 

If I then run the probeSetSummary using the same object, it says 92:
> ps1<-probeSetSummary(mfOver1,sigProbesets=sigProbe1)
> length(ps1)
[1] 92

I tried using the entire (original) probe set and the expected significant probe set and the number stays at 92. I also tried specifying "pvalue=0.05" as previously stated without any difference (since that was the original parameter). I'd be happy to discover I made a silly mistake, but don't the above results seem suspicious? Given I didn't run any code between the two examples, it's hard for me to imagine that code run prior could cause this discrepancy. But I've been wrong before...

On Feb 19, 2013, at 3:08 PM, James W. MacDonald <jmacdon at uw.edu> wrote:

> Hi Mark,
> 
> On 2/19/2013 4:41 PM, Mark Ebbert wrote:
>> Hi,
>> 
>> I am using GOstats to identify molecular functions that are over represented. I'm getting conflicting results between a method from an example I found in the lumi vignette and using probeSetSummary. Specifically, to get the list of significant categories, I was using the following code:
>> 
>> mfOver1<- hyperGTest(params.mf1)
>> mf.gGhyp.pv1<- pvalues(mfOver1)
>> mf.sigGO.ID.pv1<- names(mf.gGhyp.pv1[mf.gGhyp.pv1<  0.05])
>> mf.sigGO.Term.pv1<- getGOTerm(mf.sigGO.ID.pv1)[["MF"]]
>> length(mf.sigGO.Term.pv1)
>> 
>> The number of resulting GO terms based on this code is 105. If I use probeSetSummary, however, I only get 92 significant GO terms. Here is the code I'm using:
>> 
>> ps1<-probeSetSummary(mfOver1,0.05,sigProbesets=probeList)
>> length(ps1)
>> 
>> 
>> 
>> My understanding is that both methods should select only those categories with a p-value<  0.05, but I have no doubt misunderstood something.
> 
> Or you made a mistake somewhere. If I run
> 
> example("probeSetSummary")
> 
> To get some faked up results; hyp, a HyperGTestResult object, and ps, the output from probeSetSummary. If I then do what you intended to do (noting that the example uses the default p-value of 0.01):
> 
> > ps <- probeSetSummary(hyp, pvalue = 0.05)
> > length(ps)
> [1] 700
> > sum(pvalues(hyp) < 0.05)
> [1] 700
> > all.equal(names(pvalues(hyp)[pvalues(hyp) < 0.05]), names(ps))
> [1] TRUE
> 
> Seems the same to me.
> 
> Best,
> 
> Jim
> 
>> 
>> Thanks for your help!
>> 
>> Mark
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>