[BioC] GOstats different goDag size just changing the

James W. MacDonald jmacdon at uw.edu
Wed Oct 17 16:30:41 CEST 2012


Hi Cristobal,

On 10/17/2012 9:27 AM, Cristobal Fresno Rodríguez wrote:
> Hi Jim,
>
> Thanks for your answer. Although I understand that the test is 
> mutually exclusive (you cannot be over and under represented at the 
> same time), my pre-assumption was that the same terms ought to be 
> tested under the two hypothesis tests. Hence, the same goDag structure 
> with different p-values should come out of the analysis, which is not 
> the case. Over test only consider terms with at least one  geneId 
> whereas under, consider also terms with no geneId but with 
> universeGeneIds.

Exactly. When you are testing for over-representation, you are looking 
at all the terms that have been chosen (e.g., those with at least one 
geneId), and seeing if there are more genes for that term than would be 
expected by chance. By definition you cannot have over-representation 
for geneIds that are not in your set of significant genes.

However, when testing for under-representation, you look at all terms 
(even those for which you have no geneIds), and then see if there are 
fewer genes for that term than would be expected by chance. In this 
situation you most certainly can (and will) have under-representation 
for terms that aren't represented in your set of geneIds.

Best,

Jim


>
> Thanks to all
>
> Cristobal
>
>
>
> 2012/10/16 James W. MacDonald <jmacdon at uw.edu <mailto:jmacdon at uw.edu>>
>
>     Hi Cristobal,
>
>
>     On 10/16/2012 4:44 PM, Cristobal Fresno Rodríguez wrote:
>
>         Dear list,
>
>         I am trying to use GOstats using "under" and "over"
>         testDirection. But,
>         the hyperGTest builds two diferent goDags. Shouldn't they be
>         of the same
>         size??
>
>
>     No. The testDirection refers to over-represented and
>     under-represented GO terms, so they should be mutually exclusive
>     given the same data set.
>
>     Best,
>
>     Jim
>
>
>
>         Thanks,
>
>         Cristobal
>
>             library(GOstats)
>             load(file="Genes.RData"); genesModel<- out; rm(out)
>             univer<- unique(as.character(genesModel$GeneID))
>             paramsOver<- new("GOHyperGParams",
>
>         +   geneIds= univer[1:100],
>         +   universeGeneIds=univer[1:200],
>         +   annotation="org.Mm.eg.db",
>         +   ontology="BP",
>         +   pvalueCutoff=0.01,
>         +   conditional=FALSE,
>         +   testDirection="over")
>         Loading required package: org.Mm.eg.db
>
>             paramsUnder<- new("GOHyperGParams",
>
>         +   geneIds= univer[1:100],
>         +   universeGeneIds=univer[1:200],
>         +   annotation="org.Mm.eg.db",
>         +   ontology="BP",
>         +   pvalueCutoff=0.01,
>         +   conditional=FALSE,
>         +   testDirection="under")
>
>             over<- hyperGTest(paramsOver)
>             under<- hyperGTest(paramsUnder)
>             over
>
>         Gene to GO BP  test for over-representation
>         1146 GO BP ids tested (0 have p<  0.01)
>         Selected gene set size: 80
>              Gene universe size: 156
>              Annotation package: org.Mm.eg <http://org.Mm.eg>
>
>             under
>
>         Gene to GO BP  test for under-representation
>         1776 GO BP ids tested (0 have p<  0.01)
>         Selected gene set size: 80
>              Gene universe size: 156
>              Annotation package: org.Mm.eg <http://org.Mm.eg>
>
>             length(pvalues(over))
>
>         [1] 1146
>
>             length(pvalues(under))
>
>         [1] 1776
>
>                 [[alternative HTML version deleted]]
>
>         _______________________________________________
>         Bioconductor mailing list
>         Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>         https://stat.ethz.ch/mailman/listinfo/bioconductor
>         Search the archives:
>         http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>     -- 
>     James W. MacDonald, M.S.
>     Biostatistician
>     University of Washington
>     Environmental and Occupational Health Sciences
>     4225 Roosevelt Way NE, # 100
>     Seattle WA 98105-6099
>
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list