[BioC] finding and averaging replicate gene records

Wed Mar 16 14:48:04 CET 2005

On Mar 16, 2005, at 8:31 AM, Tomas Radivoyevitch wrote:

> Agreeing with Sean here, in my last experience where I had to reduce 
> each gene to a single metric, using Affy data I found that taking the 
> probe set with the maximum average value across all chips in the 
> dataset worked well [e.g. in two group situations the resulting 
> choices tended to be probe sets with smaller (if not the smallest) P 
> values].

This may work well with Affy, where lower values are perhaps less 
"stable" than higher values, but I'm not sure it would work in every 
situation.  For example, on other platforms, the maximum average spot 
may signify scanner saturation.  Moving to ratios, choosing the genes 
with the highest (or lowest) ratio may signify lack of expression (or 
saturation for lowest ratio) in the reference sample; in neither case 
would these genes be "believable" and perhaps another probe for the 
same gene might point that out.

Seeing Tomas's point, if one does go ahead and summarize probes into 
genes, caution must be exercised to choose the appropriate summary 
measure and note should be made that such summaries might produce bias 
in the genes found (and more importantly, validated, or not).

Sean