[BioC] Duplicated probesets for the same gene

James W. MacDonald jmacdon at med.umich.edu
Mon Apr 24 15:24:58 CEST 2006

As Sean mentioned, there are possibly many reasons for multiple 
probesets. First, they may be intended to interrogate splice variants. 
Second, these probesets are based on UniGene build 95, which is very old 
(the current build is #190), and many ESTs or Riken genes may have been 
mapped in the intervening period to genes that already existed on the chip.

In addition, many of the probesets contain probes that are now known to 
either interrogate unrelated sequences or not map to any known sequence.

You can now download the re-mapped cdfs that are provided by the 
Molecular and Behavioral Neuroscience Institute (MBNI) at the University 
of Michigan directly from BioC. These cdfs contain probesets that have 
been re-mapped based on the current UniGene, Ensembl, Entrez Gene, 
RefSeq, or Tigr annotations. The benefits of using these cdfs are 
twofold. First, there is only one probeset per gene (may not be true of 
RefSeq - I think there may be some redundancy there, but am not sure). 
Second, any probe that interrogates multiple transcripts or no longer 
maps to the genome have been removed, so theoretically you should get 
better data.

The major downside (for me at least) is the loss of the easy preprocess 
==> analyze ==> annotate pipeline provided by the affy, limma, and 
annaffy packages. However, Steffen Durinck has kindly modified his 
biomaRt code to allow for an alternate affy ==> limma ==> biomaRt ==> 
annotate analysis pipeline. Anybody interested in such things can take a 
look at the prettyOutput vignette in biomaRt.



Ye, Bin wrote:
> Hi, Saroj,
> How have you been? As far as I know, the different probe sets are
> corresponding to different region of the gene, I don't know why Affy
> do this, probably they originally thought the probe sets for the same
> gene but different region will serve just like a "probe sets sets", a
> 2nd-layer confirmation of the gene expression, but it turned out
> sometimes the different probe sets of same gene express differently
> too. Sometimes it's because the probe sets are not all hybridize to
> the coding region of the gene, so when we do our analysis, we only
> consider the expression of the coding region probe sets, which, of
> course, take some "Blast".
> Hope other experts can give better ideas about this!
> Bin
> -----Original Message----- From:
> bioconductor-bounces at stat.math.ethz.ch on behalf of Saroj Mohapatra 
> Sent: Sun 4/23/2006 6:02 PM To: bioconductor at stat.math.ethz.ch 
> Subject: [BioC] Duplicated probesets for the same gene
> Hi all,
> I have a small curiosity regarding annotation of probesets in affy
> GeneChips. I find that some times 2 probe sets refer to the same
> gene.
> For example, in the HG_U95Av2, there are 2 probesets (1369_s_at and
> 35372_r_at) both point to the same gene IL8. I wonder what is the
> scientific reason for such a duplication?
> I understand that the signal from 2 probesets would be affected by
> dye-labeling effect and hybridization effect in addition to mRNA
> abundance. What is then the point of having 2 probe sets which might
> give different results for the same gene?
> Please send any pointers/references that you find appropriate.
> Thanks for your consideration.
> With thanks,
> Saroj
> _______________________________________________ Bioconductor mailing
> list Bioconductor at stat.math.ethz.ch 
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

James W. MacDonald, M.S.
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109

Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

More information about the Bioconductor mailing list