[BioC] what's really in hgu133plus2.db?

Carlson, Marc R mcarlson at fhcrc.org
Fri Feb 18 18:45:45 CET 2011


Hi David,

All annotation data changes continuously with time as we learn more.  This is why the entire annotation repository gets rebuilt biannually (for each new release of Bioconductor).  To more directly, answer your question, the majority of the data in these packages is supplied by NCBI although other data comes from UCSC, and other sources as appropriate.  You can see where the individual mappings each get their resources from by looking at the help pages associated with each one.  But the thing that you are probably most worried about is the mapping that connects the individual probesets with the genes that are annotated in these packages.  Those mappings too are updated twice a year to the latest thing that is available from Affymetrix at that time.  And yes, they do change these mappings from time to time.  However if you don't trust them, then you might be interested to know that the MBNI also has a series of annotation packages that are based on re-mapped gene to probeset associations.  You can learn about those here:

http://brainarray.mbni.med.umich.edu/Brainarray/Service/Service.asp


Finally, if you feel really enterprising, you can also find some way to remake these mappings yourself and then use the SQLForge code in AnnotationDbi to generate a new package based on those mappings.  You can find instructions for that here:

http://www.bioconductor.org/help/bioc-views/release/bioc/html/AnnotationDbi.html

hope this helps,


  Marc



----- Original Message -----
From: "David Iles" <D.E.Iles at leeds.ac.uk>
To: bioconductor at r-project.org
Sent: Friday, February 18, 2011 8:41:04 AM
Subject: [BioC] what's really in hgu133plus2.db?

Dear All,

Can anyone point me to a URL where I can obtain an overview of the sources of the data incorporated in the current version of hgu133plus2.db? I saw to my horror that the actual probesets are based on a really obsolete human genome assembly (2003), which has changed significantly over the years. As have also genes, gene locations, genomic intervals, RefSeq/UniGene entries etcetcetc...... 

Thanks

Dave
Dr David Iles
Institute for Integrative and Comparative Biology
University of Leeds
Leeds LS2 9JT

d.e.iles at leeds.ac.uk

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list