[BioC] MBNI Custom CDF Announcement

Manhong Dai daimh at umich.edu
Thu Feb 19 19:08:25 CET 2009


Dear Bioconductor users,


Since version 11, the MBNI (Molecular & Behavioral Neuroscience
Institute, University of Michigan) Custom CDF repository is added into
Bioconductor. Therefore, Bioconductor user can use biocLite() to install
those R packages, while the traditional Download/R CMD INSTALL {} still
works.

The custom CDF is designed to make the GeneChip probe set definitions
consistent with the latest version of the genome and transcriptome
databases. Several systematic analyses show that the updated probe set
definition can lead to 30%-50% differences in the list of differentially
expressed genes and more significant changes in gene set-based analysis
approaches. We currently support most Affymetrix GeneChips and generate
probe set definitions based on several major gene and transcript
definitions for each species.

The following is a list of commonly used custom CDFs and their pros and
cons:

1. Entrez Gene based CDF: most widely used, excellent for gene-based
target definitions. One probe set for one unique gene in the
corresponding database.

2. Refseq-based CDF: most stable. good for transcript-based analysis.
The shortcoming is probe sets representing different transcripts from
the same gene may be identical or highly similar due to the lack of
transcript-specific probes on GeneChip.

3. UniGene-based CDF: used to be the preferred choice if the goal is to
represent as many genes as possible. We recommend Entrez Gene for
species that have similar gene-based probe set count, which include
human and mouse now.

4. ENSEMBL/VEGA Gene/Transcript/Exon: for researchers prefer the
ENSEMBL/VEGA system. VEGA is supposed to be expert curated thus its
gene/transcript/exon definitions are conceivably more accurate. The
exon-based probe sets can be used to detect some alternative splicing
events even in GeneChips not designed for splicing analysis.

We want to thank James MacDonald, Marc Carlson and Patrick Aboyoun for
helping us to setup custom CDFs on the bioconductor system. We also want
to thank many users for their suggestions, which are essential for the
continuous improvement of custom CDFs. Our work is supported by the
Pritzker Neuropsychiatric Disorders Research Consortium and the National
Center for Integrated Biomedical Informatics.


Best,
Manhong Dai
Molecular & Behavioral Neuroscience Institute
University of Michigan



More information about the Bioconductor mailing list