[BioC] BioC] using limma with no replicates

Mon Apr 3 13:27:03 CEST 2006

Thanks for the replay,

Yes, this is basically the issue, to group some arrays that I expect to be
experimentally correlated to have available some df that makes affordable
the estimate of the variance. My great concern was that I could be violating
some limma model assumptions that lead to a parameter estimates with no
sense at all. I still have to slight doubts that you probably could clarify
to me.

- I  am doing the grouping  in two ways,

1) I group for instance, arrays 1-2-3 and 4-5-6 and compare this two groups
to look for DE. Here, I think that the systematic experimental effects are
confounded in the two groups and the problem would be if exists some
interaction effect between the expression and some of the experimetal
conditions.

2) I group arrays 1-2, and compare arrays 3 vs 4, and 5 vs 6. Be aware that
in this case I am grouping some arrays (to have some df) but I am comparing
single replicated arrays. I think that in this way I can obtain a estimate
of the variance that is going to improve the analysis in comparison to the
arrays alone (log2 fold change approach), am I right?.-

In fact, I am observing that I have large values of the M values but nothing
seeem to be DE after multiple correction. I think that this is due to the
fact that the estimated error variance is quite large and only extreme DE
genes could be detected, is it right?.

Thanks a lot.

Pedro.

-----Mensaje original-----
De: Gordon Smyth [mailto:smyth at wehi.edu.au]
Enviado el: domingo, 02 de abril de 2006 1:08
Para: Pedro L?pez Romero
CC: bioconductor at stat.math.ethz.ch
Asunto: BioC] using limma with no replicates

Dear Pedro,

The strategy you are proposing is to ignore experimental factors
which you think will have relatively small effects, so as to generate
some degrees of freedom for error. This is an ok strategy, long used
in statistics, as long as you understand clearly what you are testing
for. If you do this, limma will try to find genes which have
differential expression which stands out relative to the effects you
have ignored.

Power is not the issue here. This approach is actually conservative,
in that the residual variability will be larger than if you had true
replicate arrays, hence you will find fewer DE genes than you might
otherwise.

Best wishes
Gordon

>Date: Fri, 31 Mar 2006 12:48:20 +0200
>From: Pedro L?pez Romero <plopez at cnic.es>
>Subject: [BioC] using limma with no replicates
>To: <bioconductor at stat.math.ethz.ch>
>
>Dear list,
>
>I have been given with some data to analyze. Unfortunately they only gave 1
>replicate per experimental condition, so I do not expect to draw meaningful
>information from here. Anyway, I would like to use limma, since I expect
>that this could be more powerful than the mere inspection of the log2 fold
>change.
>
>Despite I do not have "true biological replicates",  I think that I can
>group (in the design matrix) some arrays as if they were replicates
>according to the correlations that I expect from the experimental
conditions
>and how the data have been generated. For example, I can group 2 arrays
that
>belong to the same strain, although they have been treated a bit different,
>or I can group 2 arrays that belong to the same strain and treatment but
>different age of the mouse. This "grouped data" are not going to be part of
>the contrast. My intention (and I do not know if it is right) is to group
>some correlated data to have some degrees of freedom available to make it
>possible the estimates of the variance, and then to make contrasts with
>other 2 non replicated arrays.- I think that this would be somehow more
>powerful than the log2 fold change inspection, since the information is
>better handled trough the empirical Bayes that limma implements, but I
would
>feel better if someone back me up, because I am not pretty sure if this is
a
>good idea.
>
>
>Some piece of my code:
>
>design= model.matrix(~ -1 + factor(c(1,2,3,3,5,6,7,8)))
>colnames(design) =c("WT","upa","g1","f5","f6","f7","f8")
>
>             here g1 groups  the same strain (and different from other
>strains),  and same age of the mouse but slight different pharmacologicall
>treatment, and I will compare f5 vs f6 (this are the same strain and
>different from g1, are the same age, but treatment are different)
>
>CM= makeContrasts(f5-f6,levels=design)
>
>
>Doing this, the M values that I observe in the top list are quite high (>
>6), but the differences are not significant. I think that this is due to
the
>absence of replication in a very noisy sistem.
>
>ID         M         A          t           P.Value            B
>23620   mCG147262      -9.0828928978708
>  7.04453315872284        -20.6287756557693
>         -0.823196144084987
>19275   mCG1047122    -6.22956426050092
>.91829704792039        -15.5769614644597
        -0.940793980765775
>
>If I use genefilter to filter out some genes, some genes appear significant
>DE though. Would it be possible to explain this just by saying that
fdr-like
>techniques becomes more sensitive as less comparison are done??
>
>ID         M         A          t           P.Value            B
>263       mCG142389      -7.97481171094547
>.73475871266083        -5.3168578969303         0.00832939443377308
>6.57330274986848
>6756     BC027122         -7.40473059624002
>.77564203692944        -4.93678117706839       0.0313305586976585
>4.89829085664067
>
>
>I would appreciate any comment or suggestion very much.-
>Thank you.
>
>plr.-