[BioC] GOstats question

Robert Gentleman rgentlem at fhcrc.org
Wed Mar 30 18:33:53 CEST 2005


Hi,
   Finding fault with any annotation that is widely available is pretty  
trivial, and I personally think that it is not a useful exercise. We  
have chosen a particular method of building annotation, that is well  
documented, both with respect to publications, and perhaps more  
importantly we have published code so that you may use, as you see fit,  
and so that you may use to understand the process that we have used.
   So the short answer to David's question is because that link provides  
us with a mechanism to unambiguously link a variety of data sources (or  
rather to make use of links that have been made by others). The other  
choice is Unigene, and one could certainly build a Unigene based  
annotation system. Which is better depends on your perspective. And it  
would not take much tweaking to get AnnBuilder to do that, if that is  
what you want. Please note, our goal was not and is not to produce some  
elaborate annotation system that satisfies all comers. But rather 1) to  
produce software from which you can build your own annotation for your  
own purposes and have that work well with the Bioconductor packages and  
2) to produce generic annotation that is broadly useful to the whole  
community (note also that we get many complaints already about how big  
and slow this is - and we have tried to remedy that issue).
    We are open to concrete suggestions for improvements by those that  
are knowledgeable about particular data sources. We are more open to  
patches and code contributions that are demonstrated to work widely and  
to be of wide practical interest (not just on your favorite species or  
annotation resource).
   If there is substantial interest in implementing some of the recent  
suggestions we are happy to help coordinate efforts to make  
improvements that are of use to the entire community. We have always  
accepted patches and well thought-out contributions, and will continue  
to do so. We also continue to update our methodology and to make use of  
more accurate information as it becomes available.

   Best wishes,
     Robert

On Mar 30, 2005, at 7:37 AM, Sean Davis wrote:

>
> On Mar 30, 2005, at 10:19 AM, Rickman David wrote:
>>
>> I am new to the list and didn't see his posting --
>>
>
> I just meant that you could probably glean some detail from his note  
> that I may have left out.  I am always deleting stuff that doesn't  
> interest me at the moment, so I just meant to point out that the  
> subject has come up....
>
>
>> Even if the design (or the aim of the Bioconductor team) is limited  
>> to a
>> "general approach" which precludes working at the level of protein
>> product (or transcript) -- which is the basis of the GO annotation and
>> usually the goal of any test of GO category enrichment for a  
>> microarray
>> result -- then for a given LL # we should have all available GO terms
>> attributed, right? The example I gave showed that for at least two  
>> probe
>> sets (sharing the same LL #) this is not the case -- we have only 2 GO
>> terms to work with versus 12 (again using the same reference GOA as a
>> reference) for a well characterized gene.
>> It looks like that is what it takes to get to core of the problem --  
>> One
>> of my aims (I am sure like many using Affy data) is to summarize/study
>> lists of probe sets derived from some test at the level of GO terms.
>> Therefore it is almost intuitive that key to that aim is to resolve  
>> both
>> the multiplicity issues (many probe sets to one protein product,
>> somewhat addressed in the GOstats package -- at the level of  
>> LocusLink)
>> as well as the splice variant issues -- otherwise, it seems that
>> analyses will always stay at a "general" level.
>>
>
> Just out of curiosity, I pulled down the most recent hgu133a  
> annotation package.  I think your GO terms are there, so perhaps you  
> have an older hgu133a package?
>
> > library(reposTools)
> Loading required package: tools
> > install.packages2('hgu133a',lib='/Users/sdavis/Library/R/library')
> > library(annotate)
> > library(hgu133a)
> > names(get('207039_at',hgu133aGO))
>  [1] "GO:0007049" "GO:0007049" "GO:0007050" "GO:0000075" "GO:0004861"
>  [6] "GO:0016301" "GO:0045786" "GO:0008285" "GO:0005634" "GO:0000079"
> > names(get('211156_at',hgu133aGO))
>  [1] "GO:0007049" "GO:0007049" "GO:0007050" "GO:0000075" "GO:0004861"
>  [6] "GO:0016301" "GO:0045786" "GO:0008285" "GO:0005634" "GO:0000079"
> >
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>
+----------------------------------------------------------------------- 
----------------+
| Robert Gentleman              phone: (206) 667-7700                    
          |
| Head, Program in Computational Biology   fax:  (206) 667-1319   |
| Division of Public Health Sciences       office: M2-B865               
       |
| Fred Hutchinson Cancer Research Center                                 
          |
| email: rgentlem at fhcrc.org                                              
                          |
+----------------------------------------------------------------------- 
----------------+



More information about the Bioconductor mailing list