[BioC] HyperGTest and genelists

Seth Falcon sfalcon at fhcrc.org
Tue Apr 10 23:38:50 CEST 2007


Hi Dave,

Have you had a look at the new vignette in the devel version of
GOstats?  If not, it might help with some of your questions.  You can
find it here:

http://www.bioconductor.org/packages/2.0/bioc/html/GOstats.html

davidl at unr.nevada.edu writes:
>       I like that the new HyperGTest function lets you specify the gene
> universe, get only the more specific GO terms in your results, and easily
> output a report with the expected counts, actual counts, and p values.  With
> the old GOHyperG results, I had written a function that retrieved the list of
> significant GOterms, found the gene names associated with those GO terms, and
> found the intersection of those gene names and my genes of interest.
> So my question is this:
>
>     Is there a way to get the gene names of the genes of interest which were
> associated with the over or under represented GO terms found with
> HyperGTest?

> I noticed there is a function for the GOHyperGResult object (geneIdUniverse)
> which retrieves the entrez gene identifiers from your gene universe for all the
> tested GO terms.  Is there a way to get only the entrez gene identifiers from
> your genes-of-interest group?

I don't think that GOstats currently provides the exact features you
want, but in the devel version there is some progress along these
lines...

   sigCategories(hgOver, p) will list the GO IDs that were significant
   given p-value cutoff of p.

   selectedGenes(hgOver, id) will return a list with an element for
   each GO ID given in id containing the Entrez IDs that are in the
   intersection of the GO term and the selected gene ID list.

So I think you want:

   selectedGenes(hgOver, sigCategories(hgOver))

Then you have to convert the Entrez IDs to gene symbols.  We hope to
be adding some functions to the annotate package to make these sorts
of transformations easy...

>  Could you then filter out the GO terms which did
> not meet the p-value cutoff?

See the vignette, there are a number of ways to do this.

> Is there a function which could be applied to
> this list to change those entrez identifiers into gene names?

Not yet, but I like the idea.

>  Or is there an
> easier way to get the names of the genes from your genes of interest which
> contributed to the "Count" column of the html report?  For example, if there
> was a method for retrieving the GO terms which met the p-value cutoff from the
> GOHyperGResult object, I could just use the function I've already
> written. 

Not sure what you mean by retrieving the GO terms.  Perhaps this helps
you?

   library("annotate")
   library("GO")
   sapply(mget(sigCategories(hgOver), GOTERM), Term)

+ seth

PS: Feedback on selectedGenes and sigCategories is most welcome.
These are new functions that I've been playing with and have not had a
chance to finalize the interface and document...

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org



More information about the Bioconductor mailing list