[BioC] GOstats - defining the gene universe

Fri Oct 5 18:24:14 CEST 2007

Hi Rachel,

Rachael McBride wrote:
> Hi,
> 
> I have a quick question that I can't seem to find an answer to by 
> searching the BioC lists. I want to use GOstats on a gene list. I've 
> read the vignette and understand that defining the gene universe is an 
> important step. The vignette outlines various non-specific filtering 
> steps that can be done on an expression set in order to define the gene 
> universe. My question is are the non-specific filtering steps done on a 
> normalized or un-normalized expression set.

You would almost always want to use normalized expression data.

The vignette actually includes some steps that by all rights would have 
occurred earlier in the analysis (namely the part where low-variance 
genes are removed).

Usually the analysis proceeds something like this:

Preprocess - normalize, background correct, etc.
Filter 'uninteresting' genes to reduce multiplicity
Make comparisons
Do hypergeometric on the sets from the comparison step.

In this case the universe you would start with would be the data you 
used to make the comparisons, which already lacks the genes you filtered 
out because they were uninteresting by some measure. At this point you 
simply want to remove any duplicates, genes lacking Entrez Gene IDs, and 
genes lacking GO terms.

Best,

Jim

> 
> Thanks,
> Rachael
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623