[BioC] multiple locations for probeset in hgu133plus2CHRLOC vs. UCSC PSL data

Marc Carlson mcarlson at fhcrc.org
Tue Nov 18 18:57:44 CET 2008

Hi Peter,

I think that your confusion is coming from the fact that these are the
chromosome start locations for the genes and not the probes.  According
to Affy, that probe is supposed to be measuring that gene and we took
their word for that.  We then gave you the start positions for
transcripts of that gene according to UCSC.  We don't currently provide
the data for where the probe aligns to the genome or to which
transcripts in the genome the probe might stick to.


Bazeley, Peter wrote:
> Hello,
> R version: 2.8.0
> I just installed the hgu133plus2.db package, and am looking at the hgu133plus2CHRLOC environment. I've noticed that some of the probeset entries (e.g. "201268_at") have multiple locations compared to Affy's annotation file. I'd like to figure out if these multiple locations are current, in which case it is some sort of overlapping/repeating duplication. For example:
>> as.list(hgu133plus2CHRLOC)$'201268_at'
>       17       17       17       17 
> 46598879 46597889 46598637 46599081 
> indicates that the probeset maps to 4 locations. Compare this to the alignments info in the Affy's annotation file (from 7/8/08, http://www.affymetrix.com/Auth/analysis/downloads/na26/ivt/HG-U133_Plus_2.na26.annot.csv.zip): 
> chr12:119204403-119205041 (+) // 91.49 // q24.31 /// chr17:46598810-46604103 (+) // 96.87 // q21.33
> which suggests one location on chromosome 17 (I'm ignoring chromosome 12 for now). This is a "_at" probeset, so it should map uniquely to a sequence, according to Affy's "Data Analysis Fundamentals" document (and speaking to a rep).
> >From the information provided by "?hgu133plus2CHRLOC", I downloaded 
> ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/Homo_sapiens/database/affyU133Plus2.txt.gz 
> from UCSC to see how this occured, but it is not clear. Actually, the file:
> http://www.affymetrix.com/Auth/analysis/downloads/psl/HG-U133_Plus_2.link.psl.zip
> from Affy's support page has the same alignment info. Here's the relevant PSL info:
> Target sequence name: chr17
> Alignment start position in target: 46598810
> Alignment end position in target: 46604103
> Number of blocks in the alignment (a block contains no gaps): 5
> Comma-separated list of sizes of each block: 47,130,102,113,257,
> Comma-separated list of starting positions of each block in target: 46598810,46599186,46600601,46602296,46603846,
> The second location provided by CHRLOC (46597889) occurs before the start of the alignment in the PSL info, so perhaps this one CHRLOC location corresponds to the PSL alignment? The mappings were obtained from UCSC on 2006-Apr14, so perhaps additional alignments existed at the time, which have since been removed.
> Thank you for any help. Hopefully I'm just missing something obvious (well, non-obvious for me).
> Peter Bazeley
