[BioC] GOstats hyperGTest question

Seth Falcon sfalcon at fhcrc.org
Fri Jan 26 05:59:35 CET 2007


Hi Ivan,

ivan.borozan at utoronto.ca writes:
> I got following results using hyperGTest(params) with a given list of genes
>
>> summary(hgOver)
>         GOBPID       Pvalue   OddsRatio    ExpCount Count Size
> 1  GO:0030185 0.000000e+00  -73.314685  0.02692165     2    1
> 2  GO:0006067 0.000000e+00 -110.746479  0.05384330     3    2
> 3  GO:0006069 0.000000e+00 -110.746479  0.05384330     3    2

Hmm, that is a suspect result.  One would expect Size >= Count.  In
the current devel version of Category and GOstats, I have added code
to verify that the selected gene list (geneIds) and the gene universe
do not contain any duplicates.  Could you verify that your input does
not contain duplicate IDs either in the selected list or the universe?

> If for example I look at genes that are associated with the first GO  
> term (i.e GO:0030185) I get:
>
>
>> probeSetSummary(hgOver)[[1]]
>    EntrezID ProbeSetID selected
> 1     3043     144221        0
> 2     3043     148425        0
> 3     3043    3108408        0
> 4     3043    5708746        0

This is, of course, also surprising, but it is difficult to assess
what is going on without knowing more details of what data you used as
input.  Are you sure that all Entrez IDs in geneIds(params) are
represented by at least one probe set on the chip?

> My question is how are Counts (in this case Count = 2) in the above  
> summary(hgOver) table obtained ?

The details are in the code, but the intention is that Count is the
intersection of the selected gene list with the Entrez IDs annotated
at the given GO term.

> Looking at probeSetSummary(hgOver)[[1]] I can see one EntrezID  
> (EntrezID = 3043) and 4 ProbeSetID associated with this particular  
> node (i.e GO:0030185).

That just tells you that there are 4 probesets that interrogate Entrez
ID 3043. The count in the hyperGTest result tells you that 2 Entrez
IDs from the selected gene list are in the list of genes annotated at
GO:0030185.

I have added a considerable amount of detail to the GOstats vignette
in the current devel repository and I would suggest reading over it:

    http://www.bioconductor.org/packages/1.9/bioc/html/GOstats.html

+ seth



More information about the Bioconductor mailing list