[BioC] testing GO categories with Fisher's exact test.

Ramon Diaz-Uriarte rdiaz at cnio.es
Wed Feb 25 15:33:24 MET 2004


Another tool is

http://fatigo.bioinfo.cnio.es

Best,

R.

On Wednesday 25 February 2004 14:50, James MacDonald wrote:
> I should add to this thread that there is existing software that will do
> resampling to assess global significance of the p-values obtained from
> this sort of analysis.
>
> http://dot.ped.med.umich.edu:2000/pub/sig_terms/index.htm
>
> Best,
>
> Jim
>
>
>
> James W. MacDonald
> Affymetrix and cDNA Microarray Core
> University of Michigan Cancer Center
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
>
> >>> <cberry at tajo.ucsd.edu> 02/24/04 02:23PM >>>
>
> On Tue, 24 Feb 2004, Nicholas Lewin-Koh wrote:
> > Hi all,
> > I have a few questions about testing for over representation of terms
>
> in
>
> > a cluster.
> > let's consider a simple case, a set of chips from an experiment say
> > treated and untreted with 10,000
> > genes on the chip and 1000 differentially expressed. Of the 10000,
>
> 7000
>
> > can be annotated and 6000 have
> > a GO function assinged to them at a suitible level. Say for this
>
> example
>
> > there are 30 Go clasess that appear.
> > I then conduct Fisher's exact test 30 times on each GO category to
>
> detect
>
> > differential representation of terms in the expressed
> > set and correct for multiple testing.
> >
> > My question is on the validity of this procedure.
>
> It depends on what hypotheses you wish to test. The uniform
> distribution
> of the p value under the null hypothesis depends on ***all*** the
> assumptions of the test obtaining.
>
> The trouble is that you probably do not want to test whether the genes
> on
> your microarray are independent, since you already know that they are
>
> not:
> > Just from experience
> > many genes will
> > have multiple functions assigned to them so the genes falling into
>
> GO
>
> > classes are not independent.
> >
> > Also, there is the large set of un-annotated genes so we are in
>
> effect
>
> > ignoring the influence of
> > all the unannotated genes on the outcome. Do people have any thoughts
>
> or
>
> > opinions on these approaches? It is
> > appearing all over the place in bioinformatics tools like FATIGO,
>
> EASE,
>
> > DAVID etc.
>
> SAM and similar permutation based approaches can be implemented for
> this
> setup to get p-values (or FDR's) that do not depend on independence of
> genes/transcripts.
>
> The results given by permutation (of sample identities using the
> hypergeometric p-value as the test statistic) are several orders of
> magnitude more conservative than using the original 'p-value' even
> without
> correcting for multiple comparisons in several data sets I have seen.
>
> I recall someone from the MAPPfinder group remarking at a conference
> last
> July that MAPPfinder 2.0 would implement permutation methods. But I
> cannot
> find this release yet using google.
>
> Another approach to permutation testing of expression vs ontology is
> outlined in:
>
> Mootha VK et al. PGC-1 -responsive genes involved in
> oxidative phosphorylation are coordinately downregulated in human
> diabetes. Nature Genetics, 34(3):267 73, 2003.
>
> I find that
>
> > the formal testing approach makes me very uncomfortable, especially
>
> as
>
> > the biologists I work with tend to over interpret the results.
>
> Testing a better focussed hypothesis should increase your comfort
> level.
>
> :-)
> :
> > I am very interested to see the discussion on this topic.
> >
> > Nicholas
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
> Charles C. Berry                        (858) 534-2098
>                                          Dept of Family/Preventive
> Medicine
> E mailto:cberry at tajo.ucsd.edu	         UC San Diego
> http://hacuna.ucsd.edu/members/ccb.html  La Jolla, San Diego
> 92093-0717
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://bioinfo.cnio.es/~rdiaz
PGP KeyID: 0xE89B3462
(http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)



More information about the Bioconductor mailing list