[BioC] Problem with the function hyperGtest from GOstats package

Seth Falcon sfalcon at fhcrc.org
Thu Mar 15 17:29:34 CET 2007


"James W. MacDonald" <jmacdon at med.umich.edu> writes:
> As you already noted, the man page states
>
> 'cateogrySubsetIds': Object of class '"ANY"': If the test method
>            supports it, can be used to specify a subset of category ids
>            to include in the test instead of all possible category ids.
>
> I don't know which test method supports this argument, but apparently 
> hyperGTest() doesn't.

Unfortunately, the "cateogrySubsetIds" is a half-implemented feature
and hyperGTest ignores it.  I will add it to my list, just after the
"spell check code" item for the next release ;-)

The reason that you can't simply test all of the GO IDs and then
subset after testing is that in the current implementation, the
universe of gene IDs is determined in part by requiring that each gene
have at least one annotation in the set of GO IDs.  Hence, reducing
the set of GO IDs tested could remove some gene IDs from the universe
and that will change the results for all tests.

Now whether removing gene IDs from the universe that have no GO
annotation is the right thing to do could be up for discussion.  My
argument is that removal is good because it makes the test more
conservative.  If you leave them in, all you do is increase the size
of the gene universe and this tends to make any over-represented GO
IDs look all the more impressive.

So, sorry for the teaser w.r.t. to a method for subsetting the
category.  I hope to have code that can handle that for the next
release.

Best,

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org



More information about the Bioconductor mailing list