Cei Abreu-Goodger cei at sanger.ac.uk
Thu May 29 18:39:40 CEST 2008

Hello all,

I know the general question of "should I summarize/average/etc probes
that map to the same gene?" has been discussed many times before. But, I
feel that it might be slightly different on the Illumina platform (at
least for the Mouse chip, which is the one I have been using).

For non-control probes, there simply is no advantage to using probe
summarized data relative to target summarized data, since you basically
have the same number of distinct sequences. So, even though the probe
names have changed, and there appear to be ~70k of them, there are only
~46k different probe sequences, which just about map nicely to the
number of targets...

The numbers:

> length(as.list(lumiMouseV1TARGETID2NUID))
[1] 46116

> length(as.list(lumiMouseV1PROBEID2NUID))
[1] 70182

> length(unique(as.list(lumiMouseV1PROBEID2NUID)))
[1] 46120



