[BioC] topGO sensitive to the order of "interesting" gene ids

Adrian Alexa adrian.alexa at gmail.com
Fri Jan 21 17:09:07 CET 2011


Hi Paul,

I guess you are referring to the results of the Kolmogorov-Smirnov
like test. In this case, yes, you are right, one would expect the
ordering to influence the enrichment result, but only in the presence
of ties. The more ties you have the more the instable the results will
be. This is normal and is mainly due to the fact that KS test, or the
running-sum statistic are not able to handle ties and they must not be
used in such scenarios. If you have many ties in your data then a
enrichment test like Category test will fit better. Or if your data is
categorical, then you should use hyper-geometric like tests.

One needs to keep in mind that KS like tests must assign a unique rank
to each gene. The method for breaking the ties in the data is by the
original ordering! You can't give the same rank to the genes have the
same score. In typical microarray studies were you perform a
differential expression between conditions or a correlation analysis,
you seldom obtain ties for the significant genes. You do have many
ties for the non-significant genes (lets say all p-values of 1) but
the order of this genes is not relevant when you perform an
over-representation analysis.

Now, if you take the gene universe and the subset of interesting genes
and you give to the interesting genes a very low value (to simulate
significant p-values) like 0.01 and all the other genes you set them
to 1, you should not expect KS test to work.

I hope things are a bit more clear now.

Best regards,
Adrian





On Wed, Jan 19, 2011 at 6:27 PM, Paul Rigor <pryce at ucla.edu> wrote:
> Hi all,
>
> I wasn't sure whether I should have posted this on the list, but I think
> we've discovered some odd behavior with topGO.
>
> Given a set of the same (but differently ordered) list of uniprot id's, we
> are getting different enrichment results. I wasn't sure whether the ordering
> mattered. Or does the ordering hinge upon the ranking of the p-values? We
> are just looking for GO enrichment in non-microarray studies, btw, so we've
> faked the p-values (eg, 0.001) for the set of interesting genes.
>
> Thanks,
> Paul
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list