[BioC] GOstats and GenePix arrays

Sean Davis sdavis2 at mail.nih.gov
Thu May 11 19:31:24 CEST 2006




On 5/11/06 1:21 PM, "Jake" <jjmichael at comcast.net> wrote:

> Hi all,
> 
> I'm trying to use the "guts" of the GOHyperG function in GOstats as a
> basis for a similar function for GenePix data.  I've found a basic
> description of the phyper function in the context of GO:
> 
> # How to implement phyper function for GO analysis
> #       phyper(x-1, m, n-m , k, lower.tail = FALSE)
> #       x: number of sample genes at GO node (can be vector with many
> entries)
> #       m: number of genes at GO node (works with vector of same length
> as x)
> #       n: number of unique genes at all GO nodes
> #       k: number of unique genes in test sample that have GO mappings
> 
> Values for x and k seem straightforward, but I'm wondering about m and
> n.  The arrays we're working with seem to have fewer genes on them than
> the total number cataloged in the organism's online databases.  So
> should m and n be based on the absolute total number of genes annotated,
> or the number of genes annotated *on the chip*?

Jake,

I think the typical definition is that these should be the respective
numbers "on the chip", which guards against biases caused by array content.

Sean



More information about the Bioconductor mailing list