[BioC] Your thoughts on Limma Array Weights?

Matt Ritchie Matt.Ritchie at cancer.org.uk
Fri Jun 27 01:53:34 CEST 2008


Dear Paul,

I have noticed cases where the results are 'better' (i.e. you get more
extreme moderated t-statistics or log-odds) if you remove suspect arrays.
In one recent example I recall, the experimenter eventually discovered that
the genotype of a sample hybridised to one of their arrays was not what they
originally thought.  This meant that the linear model they were fitting was
not right.  Although the weight assigned to this array was small, removing
it from the analysis altogether still produced better results.  The array
weights method cannot correct for these kinds of gross errors.

I usually take a try it and see approach in my own analyses, similar to what
you have done (i.e. run the analysis with equal weights, with array weights,
or after removing any suspect arrays altogether, then look at the results of
each to see which gives the most extreme statistics).

Best wishes,

Matt
 
> I use limma quite a bit but have not really been using arrayWeights
> much, until recently.
> I like it a lot but have  found, in some cases, that it appears  better
> just ditch the very poorly performing arrays..and then I proceed without
> weighing . 
> 
> What are peoples real world experience with arrayWeights, are you using
> it routinely ?
> 
> For example my typical usage... time series with biological triplicates
> 
>> design
>    t0hr t6hr t24hr t24p6hr
> 1     1    0     0       0
> 2     1    0     0       0
> 3     1    0     0       0
> 4     0    1     0       0
> 5     0    1     0       0
> 6     0    1     0       0
> 7     0    0     1       0
> 8     0    0     1       0
> 9     0    0     1       0
> 10    0    0     0       1
> 11    0    0     0       1
> 12    0    0     0       1
> 
> arrayw<-arrayWeights(selDataMatrix,design=design)
>> arrayw
>         1         2         3         4         5         6         7
> 8         9 
> 1.6473168 1.2716081 1.5170375 1.0310794 1.1010048 1.2787543 0.8198722
> 0.7162097 2.3992850
>        10        11        12
> 0.1744961 1.3821469 0.6379648  ## note array 10:  which was a outlier in
> hierarchical clustering (though was still more similar to arrays its
> biological replicates than any other arrays (based on genes where
> sd/mean> 0.1).. 
> 
> fit <- lmFit(selDataMatrix, design,weights=arrayw)
> fit <- lmFit(selDataMatrix, design)
> 
> cont.matrix <- makeContrasts(
> tchange6hr="t6hr-t0hr" ,
> tchange24hr="t24hr-t0hr" ,
> tchange24p6hr="t24p6hr-t0hr" ,
> diff0to6="t6hr-t0hr" ,
> diff6to24="t24hr-t6hr" ,
> diff24to24p6="t24p6hr-t24hr" ,
> levels=design)
> 
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit2)
> 
> ** Get
>> sum(topTable(fit2,coef=1,adjust="fdr",number=5000)[,"B"]>1)
> [1] 2927
>> sum(topTable(fit2,coef=2,adjust="fdr",number=6000)[,"B"]>1)
> [1] 5263
>> sum(topTable(fit2,coef=3,adjust="fdr",number=5000)[,"B"]>1)
> [1] 2083
>> sum(topTable(fit2,coef=4,adjust="fdr",number=5000)[,"B"]>1)
> [1] 2927
>> sum(topTable(fit2,coef=5,adjust="fdr",number=5000)[,"B"]>1)
> [1] 2931
>> sum(topTable(fit2,coef=6,adjust="fdr",number=5000)[,"B"]>1)
> [1] 3810
> 
> ####################### AS APPOSED TO THE TYPICAL:
> 
> fit <- lmFit(selDataMatrix, design)
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit2)
> 
> ** Get
>> sum(topTable(fit2,coef=1,adjust="fdr",number=5000)[,"B"]>1)
> [1] 1725
>> sum(topTable(fit2,coef=2,adjust="fdr",number=6000)[,"B"]>1)
> [1] 3438
>> sum(topTable(fit2,coef=3,adjust="fdr",number=5000)[,"B"]>1)
> [1] 1512
>> sum(topTable(fit2,coef=4,adjust="fdr",number=5000)[,"B"]>1)
> [1] 1725
>> sum(topTable(fit2,coef=5,adjust="fdr",number=5000)[,"B"]>1)
> [1] 1605
>> sum(topTable(fit2,coef=6,adjust="fdr",number=5000)[,"B"]>1)
> [1] 2610
> 
> Is more differential expression  better .. always... I guess so unless
> there are more false positives right?  I am slightly worried that in
> using a linear model to access array quality and produce weights , that
> this will then naturally bias a method such as limma that then  using a
> linear model, again, to determine differential expression. After trying
> a few different permutations (use weights, remove "worst" arrays and
> redo without weights) that this is not a big concern but would welcome
> some feedback from others and insight into how they are using this
> function .
> 
> Thanks
> Paul



More information about the Bioconductor mailing list