[BioC] GOstat with replicates

Robert Gentleman rgentlem at fhcrc.org
Thu Sep 13 01:25:20 CEST 2007


Hi,
   Amplifying a bit on this (and I am not so sure I yet understand 
Naomi's use case), it seems likely that the issue here is not that one 
needs duplicates in either the Universe or the gene set, but rather, 
that in this case the naming scheme is not sufficient and one would like 
to change it (so that different transcripts had some opportunity to be 
identified).
   This is possible, but it does reveal one of the weaknesses of our 
current approach.  We will need to move our GO annotation to a more 
general mapping scheme (one based on the protein, not the gene), as it 
is likely that different splice variants have different functions (and 
hence different GO categorizations).  It is still important to consider 
whether those different splice variants (or other differences) can be 
detected by the array (in the case of microarray analysis), and if not 
then it will be important to map to the right level of resolution.
   My guess is that we will be moving slowly in that direction over the 
next year or so, and folks that have specific needs should let us know 
what their use cases are.

   best wishes
     Robert


Seth Falcon wrote:
> Hi Naomi,
> 
> Naomi Altman <naomi at stat.psu.edu> writes:
> 
>> There are times when it makes sense to have genes duplicated in both 
>> the universe and the set of interest - e.g. if the geneIds come from 
>> BLAST hits of unigenes of an unsequenced species against the genes of 
>> a sequenced species.
>>
>> I fiddled a bit with GOstat, but was not able to see how to change 
>> the code to allow this.  (I can see where duplication was removed in 
>> the gene set but not in the universe.)
>> If someone could tell me where to look in the code, I would be happy 
>> to contribute back the modified code allowing duplication.
> 
> I think you will want to look in the Category package where a fair
> amount of the infrastructure is located for the GO-based hyperGTest.
> 
> In particular, you may want to look at .makeValidParams in
> HyperGParams-accessors.R
> 
> That said, I find the duplicated gene scenario hard to understand and
> would worry that the method as implemented won't give useful results.
> 
> 
> + seth
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org



More information about the Bioconductor mailing list