[BioC] "automatic association analysis"

Weiwei Shi helprhelp at gmail.com
Fri Aug 25 18:45:20 CEST 2006


Hi, Francois and other listers:

Thank you for the detailed reply. Actually, I read those papers on GO
enrichment analysis or Gene Set one. There are basically two
approaches in stat: baysian or frequentist. The latter could use
hypergeometric or t test to derive some p-values. Currently I am using
BayGO (implemented in R) which is based on the baysian inference and
have some interesting results on a dataset about psoriasis.

My initial question is about
how to automatic "validate" or "test" the result I get from whatever
methods i use, like text mining or something like that.

But you mentioned that "The basic way to do this would be to use
an hypergeometric test (often used in the case of GO), although it can
be tricky to get right and has a few other issues.", which reminds of
another question on it:

how do u define the "success events" in hypergeometric test? and how
do you make sure the sampling has no bias when you pick genes in your
study?

I will go to find by myself but maybe someone here would like to give
me some suggestions too.

As to the pathway, I am using GeneGO's internal Metabase.

Thank you,

Weiwei

On 8/25/06, Francois Pepin <fpepin at aei.ca> wrote:
> Hi Weiwei,
>
> If you want to know if a given set of genes (ie members of the pathway)
> are behaving differently in a given set of arrays (ie your disease
> samples), there are a few ways. The basic way to do this would be to use
> an hypergeometric test (often used in the case of GO), although it can
> be tricky to get right and has a few other issues.
>
> There are other methods, such as the Gene Set Enrichment method in the
> Category package, that combine a set of t-tests together. Other packages
> like safe and sigPathway have different methods of doing the same thing.
> There was a discussion on this recently on the mailing list, you would
> probably want to look over it.
>
> As far as I can tell, all of those methods require that you have your
> pathway already defined. Some databases like KEGG or BioCarta have
> pathway definitions, but they're don't cover all pathways and few, if
> any, are up-to-date with the literature.
>
> If we really care about a given pathway, we'll go and create our own
> list ourselves from the database. It is important in such a case to
> create the list before you've started looking at the differentially
> expressed genes, because you would be biasing your results. Of course,
> it is always good to be able to explain your results a biologically
> afterward, but this is not the same as showing a statistically
> significant correlation with a pathway.
>
> Hope this helps,
>
> Francois
>
> On Thu, 2006-08-24 at 18:57 -0400, Weiwei Shi wrote:
> > Dear Listers:
> >
> > I have a question originated from pathway analysis:
> >
> > Suppose i have found a pathway which strongly associates with a
> > disease from pathway analysis; my question is on how to validate this
> > rule? I mean, is there any tool doing some automatic association
> > analysis with scientific record like PubMed and it can give some
> > evaluation on the strength of such association.
> >
> > thanks.
> >
>
>


-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III



More information about the Bioconductor mailing list