[BioC] redundant probe sets in Affymetrix HG-U219

James W. MacDonald jmacdon at med.umich.edu
Thu Apr 14 23:17:56 CEST 2011


Hi Andreas,

On 4/14/2011 5:27 AM, Andreas Heider wrote:
> Dear Bioconductor mailing list,
> is ther a sensible way to deal with redundant probesets on Affymetrix chips
> like the HG-U219?

Define sensible.

There are some things you can do, but each comes with its own assumptions.

There is the findLargest() function in genefilter that will select the 
probeset with the largest value of a test statistic. This assumes (among 
other things) that all of the redundant probesets measure the same 
thing. But note that the _x_ and _s_ in the probesets you list below 
indicate that when Affy designed that chip the probesets 
cross-hybridized with unrelated or related transcripts, respectively.

You can use the MBNI re-mapped cdfs, which take current versions of the 
genome and filter out probes that don't uniquely hybridize to the 
genome, and then map probes to probesets based on e.g., Entrez Gene IDs. 
This eliminates the problem of multiple probesets, but you then have to 
contend with probesets that vary from ~3 probes up to 100 or more. As 
you can imagine, the probesets with 3 probes will have much larger 
standard errors than those with say 100 probes. This makes downstream 
analyses more difficult unless you choose to simply ignore that fact.

You could ignore the fact that you have multiple probesets that may or 
may not be measuring the same thing, and assume independence (which, of 
course isn't even true when you have no redundant probesets).

No real satisfying alternatives, IMO, so you have to pick your poison.

Best,

Jim


> For Example:
>    Probe Set ID RefSeq Transcript ID  11715100_at NM_003534  11715101_s_at
> NM_003534  11715102_x_at NM_003534
> Should I get the median/mean of te expression intensities? Or select the
> highest? And what would be the procedre in R to do it? I mean, how do I tell
> R to return the median of expression values if there are more than 1
> probesets for only 1 refseq ID?
>
> I hope you can help me, Andreas
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues



More information about the Bioconductor mailing list