[BioC] Quick start to linking GO terms and microarray data

Wed Mar 1 12:43:34 CET 2006

On 3/1/06 6:20 AM, "michael watson (IAH-C)" <michael.watson at bbsrc.ac.uk>
wrote:

> Hi
> 
> I want to investigate the GO terms associated with my microarray data
> (normally, a list of genes from topTable() in limma)
> 
> I have read the vignettes for goTools and GOStats, and to be honest, I
> am still a little unclear what the overall process is, particularly if I
> am working with a custom array and not with affy or operon.
> 
> Lets say, for example, I have my array data in a data.frame containing
> gene names.  In a separate data frame I have a link between my gene
> names and LocusLink IDs.  How do I:
> 
> 1) Find the GO terms associated with subsets of my genes? (I realise I
> can use merge() to link my array data to the LocusLink ids, but what do
> I do then?)
>
> 2) Fins out if a particular GO term is statistically over-represented in
> a particular group

Hi, Mick.

I would take your locuslink IDs for your genes and dump out two lists to a
text file:

1)  All LocusIDs on your array.
2)  All LoucsIDs in your genelist.

Then use an external program or web tool such as DAVID/EASE to do the
analysis.

That said, there was some discussion on using straight locusIDs (rather than
requiring a metadata package) in GOHyperG.  I don't know where that
conversion stands.

As to your question about linking genes to GO, that is actually done at the
transcript/protein level.  Merging to entrez gene (locuslink) happens after
the fact.  Using various data sources, you can link by refseq, locuslink,
ensembl ids, ucsc knowngenes, human invitational ids (human), and probably
several others in species other than human.

Sean