[BioC] GOstats question

John Zhang jzhang at jimmy.harvard.edu
Wed Mar 30 17:31:14 CEST 2005

>Even if the design (or the aim of the Bioconductor team) is limited to a
>"general approach" which precludes working at the level of protein
>product (or transcript) -- which is the basis of the GO annotation and
>usually the goal of any test of GO category enrichment for a microarray
>result -- then for a given LL # we should have all available GO terms
>attributed, right? The example I gave showed that for at least two probe
>sets (sharing the same LL #) this is not the case -- we have only 2 GO
>terms to work with versus 12 (again using the same reference GOA as a
>reference) for a well characterized gene. 

The data packages were built a few months ago and will certainly not have 100% 
coverage now. You can always build your own data pacages if you want to have 
updatged annotation.

>"While there are other methods for annotating probesets (see the 
>articles you cite above), they all require aligning target or probe 
>sequences (also available from Affy) to known entities (like refseq, 
>etc.) and is NOT what the BioConductor team attempts to do (and is a 
>HUGE task to do well, having done this process for some long oligo 
>arrays).  You could do this yourself, if necessary.  
>Also, you could 
>look at Ensembl which does their own annotation of Affymetrix arrays.  
>The downside of doing these things yourself (or not using the 
>annotation packages provided by bioconductor) is that you then need to 
>either modify the nice functions from the bioconductor project to use 
>your own data or you need to make your data conform to the structures 
>needed for the functions to work (which as you point out, in this case, 
>will not suffice)."
>It looks like that is what it takes to get to core of the problem -- One
>of my aims (I am sure like many using Affy data) is to summarize/study
>lists of probe sets derived from some test at the level of GO terms.
>Therefore it is almost intuitive that key to that aim is to resolve both
>the multiplicity issues (many probe sets to one protein product,
>somewhat addressed in the GOstats package -- at the level of LocusLink)
>as well as the splice variant issues -- otherwise, it seems that
>analyses will always stay at a "general" level. 
>Thanks for the suggestions and the comments 
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch

Jianhua Zhang
Department of Medical Oncology
Dana-Farber Cancer Institute
44 Binney Street
Boston, MA 02115-6084

More information about the Bioconductor mailing list