[BioC] Limma, blocking, robust methods, and paired samples question

Gordon K Smyth smyth at wehi.EDU.AU
Thu Nov 28 04:26:40 CET 2013


Also see ?arrayWeights.  This is appropriate for dealing with outlier 
samples.

In terms of specifically robust methods, limma has three approaches:

1. lmFit(...,method="robust") deals with outliers at the observation 
level.
2. arrayWeights deal with outliers at the sample level.
3. eBayes(robust=TRUE) deals with outliers at the gene level.

In our own publications, we have used (2) most, then (3) and (1) only 
occasionally.

Gordon


On Thu, 28 Nov 2013, Gordon K Smyth wrote:

> Dear Gustavo,
>
> Your email raises a lot of different issues, so I will try to answer a few 
> things in turn.
>
> First, you appear to have paired samples, so you should analyse the data as a 
> paired design.  There is no further information to be extracted by using 
> random effects for a paired design.
>
> You have tried to combine a random effect with method="robust" but limma does 
> not support this.  If you use method="robust", then the correlation will be 
> ignored and treated as zero.
>
> You say that the consensus correlation is 1, but your output actually shows 
> it to be 0.72, and as mentioned above it has been treated as zero in the 
> analysis.
>
> I suggest that you stick to the usual paired analysis.  As an alternative to 
> lmFit(...,method="robust"), you could also try 
> eBayes(robust=TRUE,trend=TRUE).  I also suggest that you make plots of your 
> data to see what is going on rather than just trying out different analyses 
> to see what they do.  For example
>
>  plotSA(fit)
>
> is likely to be useful.
>
> Best wishes
> Gordon
>
>
>> Date: Tue, 26 Nov 2013 18:47:53 +0100
>> From: Gustavo Fernandez Bayon <gbayon at gmail.com>
>> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
>> Subject: [BioC] Limma, blocking, robust methods, and paired samples
>> 	question
>> 
>> Hello everybody.
>> 
>> I am experiencing some problems with a methylation microarray experiment I
>> am trying to analyze and, although I am able to get results to some extent,
>> I am suspicious about them and would like to know if I am doing things
>> correctly.
>> 
>> We have 12 samples. Each sample is built on DNA coming from a single cell.
>> The samples can be of two types: FLO- or FLO+. We have two samples, one
>> FLO- and one FLO+, for every type of primary tumor we are working on. So
>> the experiment could be described as follows:
>>
>>                   Tumor_Name Sample_Group
>>                  <character>  <character>
>> 9296931014_R01C01  185_8-6-11         FLO-
>> 9296931014_R02C01  185_8-6-11         FLO+
>> 9296931014_R03C01   185_SCD1S         FLO-
>> 9296931014_R04C01   185_SCD1S         FLO+
>> 9296931014_R05C01   185_SCD2S         FLO-
>> ...                       ...          ...
>> 9296931014_R02C02    185_4/12         FLO+
>> 9296931014_R03C02  A6L_8-1-10         FLO-
>> 9296931014_R04C02  A6L_8-1-10         FLO+
>> 9296931014_R05C02 A6L_23-2-10         FLO-
>> 9296931014_R06C02 A6L_23-2-10         FLO+
>> 
>> We want to find the differentially methylated probes between FLO- and FLO+.
>> If we try to fit a simple model (~ Sample_Group), we are not able to get
>> any useful result. I thought that this could be caused by the among-tumor
>> differences, so I decided to model the experiment as a paired samples test,
>> and did the following:
>> 
>> design <- model.matrix(~ Sample_Group + Tumor_Name, data=pdata)
>> fit <- lmFit(mvalues, design)
>> efit <- eBayes(fit)
>> topTable(efit, coef=2)
>> 
>> Unfortunately, no probe had an adjusted p-value under our significance
>> level. So I kept searching for methods to overcome this.
>> 
>> Lately, we have been playing around with robust methods, because we think
>> they could suit well to microarray problems, where outliers (specially in
>> those probes with SNPs nearby) could account for much variability. So I
>> decided to do the following:
>> 
>> design <- model.matrix(~ Sample_Group + Tumor_Name, data=pdata)
>> fit <- lmFit(mvalues, design, method='robust')
>> efit <- eBayes(fit)
>> topTable(efit, coef=2)
>>
>>                     logFC   AveExpr         t      P.Value
>> adj.P.Val        B
>> cg03143457       1.1528608 -2.521458  20.10818 3.293139e-08 0.01415556
>> 7.525073
>> ch.4.162336241R  1.1927542 -3.956192  16.62275 1.491918e-07 0.03076082
>> 6.622935
>> cg23320987      -0.7800961  3.997052 -14.76658 3.791254e-07 0.03076082
>> 6.201241
>> ...
>> 
>> In this case, I am able to get more than 9000 significant probes. This
>> could suffice, but I started reading discussions in this list about limma
>> being able to model random effects, and decided to see what could happen if
>> I modeled the origin tumor as a random effect:
>> 
>> design <- model.matrix(~ Sample_Group, data=pdata)
>> cor <- duplicateCorrelation(mvalues, design, block=pdata$Tumor_Name)
>> 
>>> cor$consensus.correlation
>> [1] 0.7208955
>> 
>> fit <- lmFit(mvalues, design, block=pdata$Tumor_Name,
>> correlation=cor$consensus.correlation, method='robust')
>> 
>> There were 50 or more warnings (use warnings() to see the first 50)
>>> warnings()
>> Warning messages:
>> 1: In rlm.default(x = X, y = y, weights = w, ...) :
>>  'rlm' failed to converge in 20 steps
>> 2: In rlm.default(x = X, y = y, weights = w, ...) :
>> ...
>> 
>> efit <- eBayes(fit)
>> topTable(efit, coef=2)
>> 
>>> topTable(efit, coef=2)
>>                   logFC   AveExpr         t        P.Value
>> adj.P.Val         B
>> cg13065299     0.6832614  4.468798  6.702811 0.000023155532 0.9999998
>> -3.510208
>> cg04211927     0.7980711  3.845420  6.906811 0.000017357590 0.9999998
>> -3.525569
>> ch.22.436090R  0.5894315 -3.633235  6.305634 0.000041228561 0.9999998
>> -3.526679
>> ...
>> 
>> I was expecting a different number of probes, accounting for the increased
>> power (as I have read) of a mixed model in this case, but I actually got 0
>> significant probes. Another strange thing is that consensus correlation
>> equals 1.
>> 
>> Is it possible to treat this experiment as a paired-sample test? I am
>> always doubtful when thinking about experimental designs.
>> 
>> Have I done the call to duplicateCorrelation() in a correct way? I am not
>> sure if I have to include the blocking variable in the design matrix or 
>> not.
>> 
>> Is it possible to fit a mixed model using the robust method? Maybe I am
>> trying to do too many things at once.
>> 
>> Could I trust the ~9000 probes I got from the simple, paired design?
>> 
>> Thank you very much. As always, any hint or help would be much appreciated.
>> 
>> Regards,
>> Gustavo
>> 
>> PS: My sessionInfo:
>>
>> 	[[alternative HTML version deleted]]
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list