[BioC] ...another question about using weights on microarray analysis

Jenny Drnevich drnevich at illinois.edu
Thu Feb 19 16:06:10 CET 2009


Hi Matt,

While it's good to know that array weights *can* be fit as long as 
you can fit the linear model, I was wondering if they *should* be fit 
if you only have 3 replicates. Back when I did genetics and we could 
get 10-20 replicates per group, I routinely used a robust fitting of 
the linear model, in order to minimize the effects of outliers. 
However, when there are only a few replicates, how can you tell what 
is an outlier and what is normal variation? I may be wrong, but it 
seems like your method weights arrays based on how well they fit your 
specified model, so of course it will improve the model fit. But is 
this a good idea with only a few replicates? I tried it once, and one 
of the three replicates was weighted very high, one was lowish, and 
one weighted close to zero. Therefore, the estimate for the 
coefficient of that group were almost completely due to one 
replicate, which is why I decided not to use array weights. Am I off base here?

Thanks,
Jenny

At 10:12 PM 2/18/2009, Matt Ritchie wrote:
>Dear Jenny and Erika,
>
>Regarding the question on array weights, so long as you have enough 
>arrays to fit the linear model to the means, you will be able to 
>estimate array variance factors using arrayWeights(). The example 
>given on the arrayWeights help page uses the methodology on a set of 
>6 arrays with 3 replicates per group.
>
>And yes, spot and array weights can be combined in the analysis by 
>multiplying them together. Make sure that what you are multiplying 
>are matrices of the same dimension though - the output of 
>arrayWeights is a vector, so you will want to run asMatrixWeights() 
>on this vector before multiplying with the spot weights.
>
>Best wishes,
>
>Matt
>
>>Hi Erika,
>>
>>Filtering spots on each array individually has been addressed 
>>several times on the list, and the general consensus is to only do 
>>it in very rare circumstances, such as when you have manually 
>>flagged spots that are scratches, dust spots, e.g., where the 
>>reported value has ABSOLUTELY NO RELATIONSHIP to whatever the real 
>>value might have been. Spots with low SNR, auto-flagged by GenePix 
>>as "missing", or saturated spots all have values that are 
>>approximations of what the real value is, even if they aren't as 
>>precise because they are outside the measurement abilities of the 
>>scan. As I tell my students - zeros are REAL data points - would 
>>you throw them out in other scientific measurements? NO. It is fine 
>>to throw out a spot that fails to meet your criteria on ALL arrays, 
>>like the control spots.
>>
>>I'm not sure about the array quality weights... the example uses 10 
>>replicates per group, which is probably a fine number to use to 
>>determine which arrays aren't as much alike as the others, but I'm 
>>not sure if it's good to use when you only have 3 replicates. 
>>Anyone care to comment on this?
>>
>>Cheers,
>>Jenny
>>
>>At 05:49 AM 2/17/2009, Erika Melissari wrote:
>>>Hello all,
>>>
>>>I have found discordant opinions among Bioconductor email 
>>>regarding the use of quality weights on microarray analysis and I 
>>>woul like to understand with clarity what to do before starting 
>>>the statistical analysis of my last experiment.
>>>I use LIMMA to perform statistical analysis of microarray experiments.
>>>Usually, I assign a weight to all the spots of my experiment by 
>>>using in read.maimages() this wt.fun:
>>>
>>>function(x, threshold=3){
>>>
>>>#to exclude spots with SNR<3 on both channels
>>>snrok <- !(x[,"SNR 635"] < threshold & x[,"SNR 532"] < threshold );
>>>
>>>#to include only genes and not control spots (I use Agilent microarrays)
>>>spotok <- (x[,"ControlType"] == "false");
>>>
>>>#to exclude spots with flag "bad" by GenePix Pro 6
>>>flagok <- (x[,"Flags"] >= 0);
>>>
>>>#to exclude spot saturated
>>>satok <- !(x[,"F635 % Sat."] > 10 | x[,"F532 % Sat."] > 10 );
>>>
>>>spot <- (snrok & spotok & flagok & satok);
>>>as.numeric (spot);
>>>}
>>>
>>>In my opinion it is right to exclude spot saturated (because its 
>>>intensity value is not reliable). Is it wrong?
>>>I have a doubt about excluding spot with low SNR, because in my 
>>>last experiment I should exclude for low SNR about 60% of 45000 
>>>spots and I am worried about the robustness of statistical 
>>>analysis evalued only on 40% of the genes.
>>>Should I exclude this spot?
>>>Before or after normalization?
>>>Should I normalize all the spots and then, on the normalized 
>>>value, apply the SNR quality filter to exclude normalized spots 
>>>with low SNR from subsequent statistical analysis?
>>>I would like to use arrayWeights() from LIMMA and combine spot 
>>>quality weights and array quality weights. Is it right to multiply 
>>>the spot weight matrix by array quality vector?
>>>
>>>thank you very much for any help on this complicate question.
>>>
>>>Erika
>>
>>Jenny Drnevich, Ph.D.
>>
>>Functional Genomics Bioinformatics Specialist
>>W.M. Keck Center for Comparative and Functional Genomics
>>Roy J. Carver Biotechnology Center
>>University of Illinois, Urbana-Champaign
>>
>>330 ERML
>>1201 W. Gregory Dr.
>>Urbana, IL 61801
>>USA
>>
>>ph: 217-244-7355
>>fax: 217-265-5066
>>e-mail: drnevich at illinois.edu



More information about the Bioconductor mailing list