[BioC] incorrect gene symbols in annotate

Robert Gentleman rgentlem at jimmy.harvard.edu
Fri Nov 21 17:55:33 MET 2003


To amplify a bit on Jianhua's explanation:
 the mapping between any two sets of identifiers can be problematic
 when both sides are subject to the constant evolution and improvment
 that presently exists for genomic data.

 There are many different strategies and folks need to pick one that
 satisfies their particular needs. We have decided on a process that
 we believe satisfies certain basic requirements (reproducibility
 being of primary importance). All Bioconductor metadata packages are
 produced in a well documented manner. The data sources and their
 version numbers (or dates of acquisition if the data are not
 versioned) are provided in the documentation for the package.

 This allows users to verify our mappings (but that is with respect to
 the data we have selected and the manner in which we have chosen to
 resolve conflicts that arise). Differences between our mappings and
 those available from other sources are not necessarily errors. They
 may indicate changes in knowledge between when our mapping was done
 and the current state. They may in fact represent errors and we take
 all reports such as this one seriously (but it would be helpful if
 some indication of why a person thinks there is an error, what their
 data source is etc was provided). We would especially welcome
 suggestions for reliable data sources and/or mappings that are needed
 that we do not presently supply.

 I doubt that it is possible to be concurrent with all data sources
 (and even if so, we certainly do not have those resources). I
 personally feel that not providing a well documented set of mappings
 and leaving researchers to search through the every changing
 labyrinth that is the reality of the web resources does them a great
 disservice. They can spend days trying to decide why the "same
 analysis" done at different times yielded different sets of genes
 only to find out that the web resource had
 changed between two successive queries. This lack of reproducibility
 seems to be very undesireable to me. We strive for reproducibility of
 the numerical results, we should do the same for the mappings.

 We build reasonably often (and can do so on demand), and
 provide documentation about how we built. We also archive all old
 versions so that users can assess how changes have impacted their
 previous mappings if desired.

 Robert

On Fri, Nov 21, 2003 at 11:37:51AM -0500, John Zhang wrote:
> You may get somewhat different results depending on the source you are comparing 
> the mappings to and even the time when the comparisons are made. We try to keep 
> the mappings updtated as frequently as we can. 
> 
> The link "MetaData/Annotation Packages" on Bioconductor web site contains a 
> brief description of the building process of the annotation data packages and 
> the vignettes "How to use AnnBuilder" and "Basic Functions of AnnBuilder" 
> contain instructions on how to build an annotation data package. You may try to 
> build your own annotation data package to make sure your annoataions are 
> current.
> 
>  
> >
> >I have come accross some errors when linking  probe IDs with  gene symbols.
> >
> >In most cases the probe ID retrieves the corredct gene symbol, 
> >however the following probe IDs should correspond to CD4 antigen, CD4 
> >anitigen, and FCGR3A respectively.
> >
> >genes<-c("203547_at","216424_at","204006_s_at")
> >symbol<-multiget(genes,env=hgu133aSYMBOL)
> >
> >symbol
> >$"203547_at"
> >[1] "C3F"
> >
> >$"216424_at"
> >[1] NA
> >
> >$"204006_s_at"
> >[1] "FCGR3B"
> >
> >
> >
> >Regards
> >
> >
> >Anthony
> >
> >
> >
> >R session codes
> >
> >
> >library(biobase)
> >library(annotate)
> >library(hgu133a)
> >
> >genes<-c("203547_at","216424_at","204006_s_at")
> >symbol<-multiget(genes,env=hgu133aSYMBOL)
> >-- 
> >______________________________________________
> >
> >Anthony Bosco - Cell Biology Research Assistant
> >
> >Institute for Child Health Research
> >(Company Limited by Guarantee ACN 009 278 755)
> >Subiaco, Western Australia, 6008
> >
> >Ph 61 8 9489  , Fax 61 8 9489 7700
> >email anthonyb at ichr.uwa.edu.au
> >______________________________________________
> >	[[alternative HTML version deleted]]
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor at stat.math.ethz.ch
> >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> 
> Jianhua Zhang
> Department of Biostatistics
> Dana-Farber Cancer Institute
> 44 Binney Street
> Boston, MA 02115-6084
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

-- 
+---------------------------------------------------------------------------+
| Robert Gentleman                 phone : (617) 632-5250                   |
| Associate Professor              fax:   (617)  632-2444                   |
| Department of Biostatistics      office: M1B20                            |
| Harvard School of Public Health  email: rgentlem at jimmy.harvard.edu        |
+---------------------------------------------------------------------------+



More information about the Bioconductor mailing list