[BioC] Affymetrix Human Exon Array MPS and PS files contain different probeset groups

Daniel Brewer daniel.brewer at icr.ac.uk
Tue Jun 26 18:46:11 CEST 2007

This is not strictly a bioconductor question, but it is in the
processing I use bioconductor and someone might have a similar experience.

I use "apt-probeset-summarize" to produce Exon-level and gene-level
signals.  Different probesets are assigned to a gene or Exon based on
the evidence to support this association.  I use the "core" grouping.
This grouping is defined by two files, one a probeset file (PS) which is
 simply a list of identifiers and the meta-probeset file which is a file
with four columns:
1) probeset_id
2) transcript_cluster_id (Always same as 2)
3) probeset_list (list of probesets associated with the transcription
4) probe_count (the total number of probes)

I might be confused about the true meaning of the meta probeset file but
from what I can see, the probesets in a particular grouping should be in
both the mps and the ps files if associated with a gene. This does not
appear to be the case. For example if we look at the PTEN gene (3256689).

The mps file (HuEx-1_0-st-v2.r2.dt1.hg18.core.mps) has the following line:
3256689 3256689 3256702 3256703 3256704 3256705 3256740 3256780 24
i.e. there are 6 probesets associated (3256702, 3256703,3256704,
3256705,3256740 & 3256780).

Using NETAFFX or
suggest that there are 23 core probesets associated with this gene
"3256772","3256773","3256777","3256778", "3256779" & "3256780").

This difference could significantly effect the gene summary results.
Does anyone know whether this discrepancy is on purpose? and if so way?
Am I using the correct mps file?

Daniel Brewer, Ph.D.

Institute of Cancer Research
Email: daniel.brewer at icr.ac.uk

