[BioC] GOstats, geneCounts and gene universe filtering...

Mon May 14 13:15:55 CEST 2007

its works:-)

one more question regarding GOstats.-) in ur description of the  
GOstats package u mention that the conditional test is similar to  
that presented in alexa et al 2006.  would that be like the elim or  
weight function they describe? i tried compare GOstats and topGO  
(alexa GO analysis package) and they produce similiar outputs though  
not identical.  i wonder if the differences are due to the fact that  
i feed entrez IDs into the GOstats package and affy IDs into the  
topGO package, so they are not based on entirely the same set of  
genes IDs? or do the statistical method between the two vary? its not  
so clear for me from the GOstats description exactly what u did in  
this conditional test?  i could have missed something, so if its  
described somewhere in more detail a pointer to that would just dandy:-)

then lastly, these system biology analysis tools for microarray data  
seems very helpfull, like the GO enrichment analysis of GOstats and  
topGO. But i relise that a lot of genes are not annotated with GO  
terms and i wonder how much im actaully missing by this incomplete  
annotation of genes. it becomes even "worse" for KEGG where less  
genes are annotated and the amount of significant KEGG pathways that  
comes out of the GOstats analysis are few. what is ur experience with  
these kinds of analysis? how far can u push conclusions based on  
these types of analysis?

i have also seen private companies offering curated protein-protein  
interaction databases to conduct similar analysis. does that bring  
something new to the picture? i mean that type of network describes a  
different way of linking genes into nodes and edges perhaps more  
similar to KEGG than GO.  but do they inlcude more genes than f.ex.  
KEGG and are they worth the investment so to speak - to get acces i  
mean? and also analysis based on promotor analysis (ex. cartharius et  
al, 2005, bioinformatics) in the search for common promotors and  
hence common transcription factor regulation which creates yet  
another network of transcriptional regulation.  these both seem like  
interesting analysis methods but are there any implementations of  
such tools for R and bioconductor - with acces to protein interaction  
databases  or promotor sequence/location databases?

im not too familiar with these tools but im trying to figure out  
where to focus my efforts to get maximum information out of my  
microarray data. i like the network approach and the "holistic"  
perspective of gene expression and regulation, but unfortunately im  
not too knowledgeable about the available tools for this kind of  
analysis nor the possible pitfalls these types of analysis might be  
"hiding" and one should be aware of. any hints, links, pointers,  
comment or sharing of experience would be most welcome:-)

cheers,
jesper ryge
Phd Student,
Department of Neuroscience
Karolinska Institutet

On 11 May 2007, at 18:53, Seth Falcon wrote:

> Jesper Ryge <Jesper.Ryge at ki.se> writes:
>
>> thanks for the fast answer:-) its nice to know im battling my way in
>> the right direction...
>
> I believe I have found and fixed the bug causing the discrepancy in
> counts for conditional hyperGTests.  The problem was that one of the
> functions was consulting the gene universe, not the _conditional_ gene
> universe.
>
> The new versions for the release are:
>
>     Category 2.2.3
>     GOstats 2.2.2
>
> They should be available in the repository by Monday.
>
> + seth
>
> -- 
> Seth Falcon | Computational Biology | Fred Hutchinson Cancer  
> Research Center
> http://bioconductor.org
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/ 
> gmane.science.biology.informatics.conductor