[BioC] GOstats and GenePix arrays

Thu May 11 19:31:24 CEST 2006

On 5/11/06 1:21 PM, "Jake" <jjmichael at comcast.net> wrote:

> Hi all,
> 
> I'm trying to use the "guts" of the GOHyperG function in GOstats as a
> basis for a similar function for GenePix data.  I've found a basic
> description of the phyper function in the context of GO:
> 
> # How to implement phyper function for GO analysis
> #       phyper(x-1, m, n-m , k, lower.tail = FALSE)
> #       x: number of sample genes at GO node (can be vector with many
> entries)
> #       m: number of genes at GO node (works with vector of same length
> as x)
> #       n: number of unique genes at all GO nodes
> #       k: number of unique genes in test sample that have GO mappings
> 
> Values for x and k seem straightforward, but I'm wondering about m and
> n.  The arrays we're working with seem to have fewer genes on them than
> the total number cataloged in the organism's online databases.  So
> should m and n be based on the absolute total number of genes annotated,
> or the number of genes annotated *on the chip*?

Jake,

I think the typical definition is that these should be the respective
numbers "on the chip", which guards against biases caused by array content.

Sean