[BioC] GOstats gene set size selection

Sean MacEachern sean.maceach at gmail.com
Thu Apr 17 18:06:49 CEST 2008


Hi Alex,
I'm not too sure if this helps with your question, but I'll put my two cents
in... I am working with chickens and trying to create a large list of genes
for an eQTL study from an initial simple microarray design that compares
resistant vs susceptible birds, due to the small number of genes that I have
found with differential expression I have attempted to increase the size of
my list by examining significant GO terms. Most of the GO terms I have
pulled out using hyperGTest are not very helpful due to their breadth.
I have found the Category package a little more helpful. Kegg pathways are a
little more specific and  you can create an adjacency matrix and use the
rowSums() command to filter your dataset. I think you can also treat GO
terms as categories if you need to. It might be a little of topic, but it
could be worth looking at.

Cheers,

Sean 


On 4/17/08 7:28 AM, "alex lam (RI)" <alex.lam at roslin.ed.ac.uk> wrote:

> Dear colleagues,
> 
> I have been following the GOstats vignette to test GO terms association.
> I would like to know whether it is possible to set limits on the number
> of selected genes in GO term and the size of that term on my affy chip?
> 
> For example, can I tell hyperGTest to skip testing a GO term if the
> number of significant genes in that term is under, say, 3, or if there
> are more than 400 genes of that GO term on the chip?
> 
> Currently I found many of my significant GO terms not very specific. As
> I am trying to incorporate GOstats to an expression QTL (eQTL) genome
> scan, I get a lot of output. Therefore, ideally I would like to filter
> out these terms before test rather than screening the results after
> test. Is there such an option with hyperGTest?
> 
> Many thanks for your advice,
> Alex
> 
>> sessionInfo()
> R version 2.6.2 Patched (2008-03-24 r44882)
> x86_64-unknown-linux-gnu
> 
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.U
> TF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-
> 8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_ID
> ENTIFICATION=C
> 
> attached base packages:
> [1] splines   tools     stats     graphics  grDevices utils     datasets
> [8] methods   base
> 
> other attached packages:
>  [1] GOstats_2.4.0       Category_2.4.0      genefilter_1.16.0
>  [4] survival_2.34       RBGL_1.14.0         annotate_1.16.1
>  [7] xtable_1.5-2        GO.db_2.0.2         AnnotationDbi_1.0.6
> [10] RSQLite_0.6-8       DBI_0.2-4           Biobase_1.16.3
> [13] graph_1.16.1
> 
> loaded via a namespace (and not attached):
> [1] cluster_1.11.10
>> 
> 
> --------------------------------------------
> Alex C. Lam
> Roslin Institute (Edinburgh)
> Midlothian
> EH25 9PS
> United Kingdom
> Tel: +44 131 5274471
> 
> Former email address: alex.lam at bbsrc.ac.uk
> New email address: alex.lam at roslin.ed.ac.uk
> Both addresses are functional
> 
> Roslin Institute is a company limited by guarantee, registered in
> Scotland (registered number SC157100) and a Scottish Charity (registered
> number SC023592). Our registered office is at Roslin, Midlothian, EH25
> 9PS. VAT registration number 847380013.
> 
> The information contained in this e-mail (including any attachments) is
> confidential and is intended for the use of the addressee only.   The
> opinions expressed within this e-mail (including any attachments) are
> the opinions of the sender and do not necessarily constitute those of
> Roslin Institute (Edinburgh) ("the Institute") unless specifically
> stated by a sender who is duly authorised to do so on behalf of the
> Institute
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list