[BioC] GOstats gene set size selection

Thu May 1 15:59:24 CEST 2008

Hi Sean and other BioC users,

Thanks for the replies a couple of weeks ago. Now I am trying to use
Category as suggested and I think the underlying principles are better
than Gostats for what I want to do, especially that I don't have to use
an arbitary threshold on my test statistics to select a subset of genes.

I followed the code in the vignette of Category until the matrix Z gets
divided by sqrt(rowSums).

Because what I am doing is an eQTL genome scan, at any one position I
have the likelihood ratio test statistics for all probesets rather than
two-sample t-statistics. I read in the vignette that X should be
approximately normal. So, I figure that maybe I should standardize the
likelihood ratio statistics to z-scores before multiplying with the
adjacency matrix. Is it the correct thing to do?

for(cM in 1:lengthOfGenome) {
  lrt <- LRT[expressedAffyIds, cM]
  # ... filter out duplicates entrezGenes and create adjacency matrix
...

  z.score <- (lrt - mean(lrt)) / sd(lrt)
  tA <- AmER2 %*% z.score
  tA <- tA / sqrt(rs2)
  names(tA) <- row.names(AmER2)
  qqnorm(tA)
}

Cheers,
Alex

-----Original Message-----
From: Sean MacEachern [mailto:sean.maceach at gmail.com] 
Sent: 17 April 2008 17:07
To: alex lam (RI); bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] GOstats gene set size selection

Hi Alex,
I'm not too sure if this helps with your question, but I'll put my two
cents in... I am working with chickens and trying to create a large list
of genes for an eQTL study from an initial simple microarray design that
compares resistant vs susceptible birds, due to the small number of
genes that I have found with differential expression I have attempted to
increase the size of my list by examining significant GO terms. Most of
the GO terms I have pulled out using hyperGTest are not very helpful due
to their breadth.
I have found the Category package a little more helpful. Kegg pathways
are a little more specific and  you can create an adjacency matrix and
use the
rowSums() command to filter your dataset. I think you can also treat GO
terms as categories if you need to. It might be a little of topic, but
it could be worth looking at.

Cheers,

Sean 

On 4/17/08 7:28 AM, "alex lam (RI)" <alex.lam at roslin.ed.ac.uk> wrote:

> Dear colleagues,
> 
> I have been following the GOstats vignette to test GO terms
association.
> I would like to know whether it is possible to set limits on the 
> number of selected genes in GO term and the size of that term on my
affy chip?
> 
> For example, can I tell hyperGTest to skip testing a GO term if the 
> number of significant genes in that term is under, say, 3, or if there

> are more than 400 genes of that GO term on the chip?
> 
> Currently I found many of my significant GO terms not very specific. 
> As I am trying to incorporate GOstats to an expression QTL (eQTL) 
> genome scan, I get a lot of output. Therefore, ideally I would like to

> filter out these terms before test rather than screening the results 
> after test. Is there such an option with hyperGTest?
> 
> Many thanks for your advice,
> Alex
> 
>> sessionInfo()
> R version 2.6.2 Patched (2008-03-24 r44882) x86_64-unknown-linux-gnu
> 
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US
> .U
> TF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UT
> F- 
> 8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_
> ID
> ENTIFICATION=C
> 
> attached base packages:
> [1] splines   tools     stats     graphics  grDevices utils
datasets
> [8] methods   base
> 
> other attached packages:
>  [1] GOstats_2.4.0       Category_2.4.0      genefilter_1.16.0
>  [4] survival_2.34       RBGL_1.14.0         annotate_1.16.1
>  [7] xtable_1.5-2        GO.db_2.0.2         AnnotationDbi_1.0.6
> [10] RSQLite_0.6-8       DBI_0.2-4           Biobase_1.16.3
> [13] graph_1.16.1
> 
> loaded via a namespace (and not attached):
> [1] cluster_1.11.10
>> 
> 
> --------------------------------------------
> Alex C. Lam
> Roslin Institute (Edinburgh)
> Midlothian
> EH25 9PS
> United Kingdom
> Tel: +44 131 5274471
> 
> Former email address: alex.lam at bbsrc.ac.uk New email address: 
> alex.lam at roslin.ed.ac.uk Both addresses are functional
> 
> Roslin Institute is a company limited by guarantee, registered in 
> Scotland (registered number SC157100) and a Scottish Charity 
> (registered number SC023592). Our registered office is at Roslin, 
> Midlothian, EH25 9PS. VAT registration number 847380013.
> 
> The information contained in this e-mail (including any attachments)
is
> confidential and is intended for the use of the addressee only.   The
> opinions expressed within this e-mail (including any attachments) are 
> the opinions of the sender and do not necessarily constitute those of 
> Roslin Institute (Edinburgh) ("the Institute") unless specifically 
> stated by a sender who is duly authorised to do so on behalf of the 
> Institute
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor