[BioC] GOstats, geneCounts and gene universe filtering...

Mon May 14 23:17:50 CEST 2007

Jesper Ryge <Jesper.Ryge at ki.se> writes:

> its works:-)

Glad to hear it.

> one more question regarding GOstats.-) in ur description of the
> GOstats package u mention that the conditional test is similar to
> that presented in alexa et al 2006.  would that be like the elim or
> weight function they describe? i tried compare GOstats and topGO
> (alexa GO analysis package) and they produce similiar outputs though
> not identical.  i wonder if the differences are due to the fact that
> i feed entrez IDs into the GOstats package and affy IDs into the
> topGO package, so they are not based on entirely the same set of
> genes IDs? or do the statistical method between the two vary? its not
> so clear for me from the GOstats description exactly what u did in
> this conditional test?  i could have missed something, so if its
> described somewhere in more detail a pointer to that would just
> dandy:-)

The methods are similar, but were developed independently.  So I would
hope that the results are similar.  I would be rather surprised if
they were identical.

Did you find our article in Bioinformatics?  It has a description of
the conditional computation done in GOstats.  The reference is:

  S Falcon and R Gentleman. Using GOstats to test gene lists for GO
  term association. Bioinformatics, 23(2):257-8, 2007.

If that isn't enough, I can try to give further details...

> then lastly, these system biology analysis tools for microarray data
> seems very helpfull, like the GO enrichment analysis of GOstats and
> topGO. But i relise that a lot of genes are not annotated with GO
> terms and i wonder how much im actaully missing by this incomplete
> annotation of genes. it becomes even "worse" for KEGG where less
> genes are annotated and the amount of significant KEGG pathways that
> comes out of the GOstats analysis are few. what is ur experience with
> these kinds of analysis? how far can u push conclusions based on
> these types of analysis?

I think it is important to remember that annotation sources like GO
and KEGG are not complete.  So I would suggest not pushing such
analysis too far ;-)
[sorry, perhaps someone else will have a better answer for you]

> i have also seen private companies offering curated protein-protein
> interaction databases to conduct similar analysis. does that bring
> something new to the picture? i mean that type of network describes a
> different way of linking genes into nodes and edges perhaps more
> similar to KEGG than GO.  but do they inlcude more genes than f.ex.
> KEGG and are they worth the investment so to speak - to get acces i
> mean? and also analysis based on promotor analysis (ex. cartharius et
> al, 2005, bioinformatics) in the search for common promotors and
> hence common transcription factor regulation which creates yet
> another network of transcriptional regulation.  these both seem like
> interesting analysis methods but are there any implementations of
> such tools for R and bioconductor - with acces to protein interaction
> databases  or promotor sequence/location databases?
>
> im not too familiar with these tools but im trying to figure out
> where to focus my efforts to get maximum information out of my
> microarray data. i like the network approach and the "holistic"
> perspective of gene expression and regulation, but unfortunately im
> not too knowledgeable about the available tools for this kind of
> analysis nor the possible pitfalls these types of analysis might be
> "hiding" and one should be aware of. any hints, links, pointers,
> comment or sharing of experience would be most welcome:-)

I don't have any experience with the proprietary databases.  There has
been some work on protein interaction data Cf. ppiStats, ScISI.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org