[BioC] Treatment of Duplicate spots

Thu Feb 10 08:11:12 CET 2005

Hi Peter,

>> The information from the duplicate spots can be summarised using 
>> lmFit() with the appropriate arguments.  The approach taken in limma 
>> is to assume that the duplicate spots are correlated by being on the 
>> same array, a fixed distance apart (the function 
>> duplicateCorrelation() is used to estimate this correlation).  An 
>> alternative approach would be to average the duplicate log-ratios 
>> prior to fitting the linear model.
>>
>>> For the case of duplicate spotting, what is the significance of 
>>> merging the raw channels seperately prior to creating MA values with 
>>> the loess normalization, then between chip scaling.
>>
>>
>> I'm not sure what you mean here.  There are usually two channels per 
>> array for two-colour microarrays.  Do you mean create 4 channels per 
>> array, one for each duplicate set in each channel?  I'm not sure that 
>> this would be helpful.
>
>
>
> Actually, my bad.  I meant merging the duplicate spots WITHIN each raw 
> channel seperately  PRIOR to calculating the log-ratios (M-values). 
> The duplicate spots on our arrays correlate very very well, to the 
> point where I think that spotting probes twice seems wasteful (it 
> would be better if the duplicate spots were randomly distributed or 
> duplicate spotting to be meaningful IMHO, but the spotting technology 
> is not capable of doing this f).
>
> I like the idea of using quantile scaling between chips, assuming 
> n-spots for m-genes that will be fine. however when there are 
> duplicate spots for each probe, each probe is adjusted independently, 
> and when I compared the M values with the raw R and G channel 
> duplicates, the correlation between the duplicate M-values was quite 
> poor. I am expecting this is because the quantile normalization 
> assumes that each duplicate-spot is handled separately.
>
> So my question is, do I gain or loose by merging the raw duplicate 
> values within the R and G separately prior to calculating the M 
> values. I am no expert in statistics to say whether or not this is 
> acceptable.

I'm not aware of any careful study that assesses whether it is better to 
'merge' (I assume you mean average?) the raw R and G intensities from 
duplicate spots or keep them separate (this might be a research question 
for you).

Obviously you won't be able to make use of the method I described to you 
in the previous email, where the duplicate correlation is used in the 
linear model.  This approach has been studied and can offer improvements 
over averaging if you are assessing differential expression.  For the 
reference see

Smyth, G. K., Michaud, J., and Scott, H. (2005). The use of within-array 
replicate spots for assessing differential expression in microarray 
experiments. Bioinformatics 21, to appear. (available from 
http://www.statsci.org/smyth/pubs/dupcor.pdf)

Best wishes,

Matt Ritchie

>>> How many spots in a chip would be required to run quantile 
>>> normalization vs scale normalization when using normalizeBetweenArrays?
>>
>>
>> The lower limit for quantile normalization is 2 spots, and for scale 
>> normalization it's 1 spot.  Normalization is probably not such a big 
>> deal with so few spots though ;)
>
>
> yes if I only had 2 good spots I would generally be unhappy with 
> microarray. But it seems that I need to use scale normalisation for 
> small chips, like 300 spots, and quantile for large arrays like 19k, 
> because with such a large scale of points, scale normalization may 
> force more genes into the tails of the distribution of M-values, if 
> you were looking at the box-plots.

> Thanks for the help
>
> Peter

	[[alternative HTML version deleted]]