[BioC] GOstats and GenePix arrays

Thu May 11 20:17:22 CEST 2006


Jake wrote:
> I hadn't thought of going through the trouble of making a custom
> annotation package.  Last time I tried making one was quite a while back
> and it was quite a pain.  I'm sure things work more smoothly now, but by
> looking at GOHyperG I realized all I really need is phyper and the
> appropriate GO mappings, which I've gotten through TAIR and the use of
> GOANCESTOR.
> 

  Well, if you don't go to the trouble, then you will almost surely be 
getting the wrong answer, and to paraphrase one of the really clever 
folks, there are easier and faster ways to do that :-)

> I guess in the light of making a custom annotation package, GOHyperG
> isn't *technically* Affy-only, though with components like "go2Affy",
> it's obvious what type of data was in mind.
> 
> Thanks for the comments and insight.
> 
> --Jake
> 
> 
> On Thu, 2006-05-11 at 10:57 -0700, rgentlem wrote:
>> Hi,
>>
>>   I am not sure why you think that you should do anything different for 
>> GenePix? The array used is completely irrelevant to this sort of 
>> hypergeometric testing and there should be no need to modify GOstats in 
>> any way.
>>   You simply make an annotation package for your array (using AnnBuilder 
>> or any other tool of your choice) and then use it.
>>
>>   best wishes
>>    Robert
>>
>> Jake wrote:
>>> Hi all,
>>>
>>> I'm trying to use the "guts" of the GOHyperG function in GOstats as a
>>> basis for a similar function for GenePix data.  I've found a basic
>>> description of the phyper function in the context of GO:
>>>
>>> # How to implement phyper function for GO analysis
>>> #       phyper(x-1, m, n-m , k, lower.tail = FALSE)
>>> #       x: number of sample genes at GO node (can be vector with many
>>> entries)
>>> #       m: number of genes at GO node (works with vector of same length
>>> as x)
>>> #       n: number of unique genes at all GO nodes
>>> #       k: number of unique genes in test sample that have GO mappings
>>>
>>> Values for x and k seem straightforward, but I'm wondering about m and
>>> n.  The arrays we're working with seem to have fewer genes on them than
>>> the total number cataloged in the organism's online databases.  So
>>> should m and n be based on the absolute total number of genes annotated,
>>> or the number of genes annotated *on the chip*?
>>>
>>> Thanks in advance,
>>>
>>> Jake
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
> 
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org