[BioC] BioC] using limma with no replicates

Gordon Smyth smyth at wehi.edu.au
Sun Apr 2 01:07:55 CEST 2006


Dear Pedro,

The strategy you are proposing is to ignore experimental factors 
which you think will have relatively small effects, so as to generate 
some degrees of freedom for error. This is an ok strategy, long used 
in statistics, as long as you understand clearly what you are testing 
for. If you do this, limma will try to find genes which have 
differential expression which stands out relative to the effects you 
have ignored.

Power is not the issue here. This approach is actually conservative, 
in that the residual variability will be larger than if you had true 
replicate arrays, hence you will find fewer DE genes than you might otherwise.

Best wishes
Gordon

>Date: Fri, 31 Mar 2006 12:48:20 +0200
>From: Pedro L?pez Romero <plopez at cnic.es>
>Subject: [BioC] using limma with no replicates
>To: <bioconductor at stat.math.ethz.ch>
>
>Dear list,
>
>I have been given with some data to analyze. Unfortunately they only gave 1
>replicate per experimental condition, so I do not expect to draw meaningful
>information from here. Anyway, I would like to use limma, since I expect
>that this could be more powerful than the mere inspection of the log2 fold
>change.
>
>Despite I do not have "true biological replicates",  I think that I can
>group (in the design matrix) some arrays as if they were replicates
>according to the correlations that I expect from the experimental conditions
>and how the data have been generated. For example, I can group 2 arrays that
>belong to the same strain, although they have been treated a bit different,
>or I can group 2 arrays that belong to the same strain and treatment but
>different age of the mouse. This "grouped data" are not going to be part of
>the contrast. My intention (and I do not know if it is right) is to group
>some correlated data to have some degrees of freedom available to make it
>possible the estimates of the variance, and then to make contrasts with
>other 2 non replicated arrays.- I think that this would be somehow more
>powerful than the log2 fold change inspection, since the information is
>better handled trough the empirical Bayes that limma implements, but I would
>feel better if someone back me up, because I am not pretty sure if this is a
>good idea.
>
>
>Some piece of my code:
>
>design= model.matrix(~ -1 + factor(c(1,2,3,3,5,6,7,8)))
>colnames(design) =c("WT","upa","g1","f5","f6","f7","f8")
>
>             here g1 groups  the same strain (and different from other
>strains),  and same age of the mouse but slight different pharmacologicall
>treatment, and I will compare f5 vs f6 (this are the same strain and
>different from g1, are the same age, but treatment are different)
>
>CM= makeContrasts(f5-f6,levels=design)
>
>
>Doing this, the M values that I observe in the top list are quite high (>
>6), but the differences are not significant. I think that this is due to the
>absence of replication in a very noisy sistem.
>
>ID         M         A          t           P.Value            B
>23620   mCG147262      -9.0828928978708
>  7.04453315872284        -20.6287756557693
>         -0.823196144084987
>19275   mCG1047122    -6.22956426050092
>.91829704792039        -15.5769614644597       1          -0.940793980765775
>
>If I use genefilter to filter out some genes, some genes appear significant
>DE though. Would it be possible to explain this just by saying that fdr-like
>techniques becomes more sensitive as less comparison are done??
>
>ID         M         A          t           P.Value            B
>263       mCG142389      -7.97481171094547
>.73475871266083        -5.3168578969303         0.00832939443377308
>6.57330274986848
>6756     BC027122         -7.40473059624002
>.77564203692944        -4.93678117706839       0.0313305586976585
>4.89829085664067
>
>
>I would appreciate any comment or suggestion very much.-
>Thank you.
>
>plr.-



More information about the Bioconductor mailing list