[BioC] Analyzing mulitple tissues

Tue Jun 7 15:09:17 CEST 2005

Dear Naomi, Gordon and Uri,

If I might try to bring together Naomi's comments with those of Gordon and
see if I have followed this correctly:

Uri's original design is:

Cardiac1, Cardiac2, Cardiac3
Skeletal1, Skeletal2
MSC

That is, a 3x2x1 6-chip experiment.

Naomi commented that with no replication (i.e. the single MSC chip) one
cannot judge biological variation and the best thing to do is a simple
fold-change:  "...there is no statistically valid means of analyzing your
data that improves on an arbitrary choice of 'fold difference', such as
2-fold difference." {Naomi}

Then Gordon replied:

>> Out of curiosity, what is limma doing here and how should one interpret
>> these t stats/p-values (if indeed one should!)?  Are they any use over
>> simple M values?
> 
> Yes, they are almost always better than simply using fold changes.  Using
> M-values alone would
> make no use of replication while the t-statistics make use of whatever
> replication is available.
> Put very simply, some replication is better than none.
> 
> You seem to be concerned in your mock experiment that one of the states has no
> replication.  The
> limma analysis estimates the variance for each gene from the replicates
> available for states 1 and
> 2 and applies that estimate to state 3 as well.  This analysis is perfectly
> valid provided that
> the variability of the expression values is similar in state 3 to that in
> states 1 and 2.
> 
> Even when the variability is different in state 3, the limma analysis still
> gives a better ranking
> than fold change, even for comparisons involving state 3, in most cases.  The
> basic assumption is
> that, across genes, the variance in state 3 is positively associated with the
> variance in states 1
> and 2.  This is a very weak assumption which is almost always true in
> practice, as genewise
> differences in variability tend to dominate state-wise differences.

If I follow Gordon correctly, his argument is that in an experimental design
like this you can make an estimate for the variance of a probeset based on
its behaviour in the other samples (with some opportunity for discussion as
to how valid an assumption this is!).   This results in a situation where
not all fold changes are equal, and this will actually work better than a
simple FC estimate for ranking the genes for further exploration.

In other words, in the 3x2x1 design such as this you could get two probesets
that had identical M values (calculated between the triplicate and single
chips) *but* limma would rank the probeset with the higher overall
variability across the six chips lower down the list (seen as a different
p-value/t statistic).

So Uri could use limma to study this 3x2x1 design and be able to extract
potentially differentially regulated genes between the (single) MSC sample
and either/both of the other two sample classes using the limma p-values
returned, and this would be a more powerful approach than simple
fold-changes - yes?

This is a very interesting point for those of us in core facilities having
to help users who insist - for reasons of finances, scarce samples, or the
fact they the experiments are of a preliminary grant-generating nature - on
doing small-scale experiments where some samples have no replication at all.
Me telling them to go away and come back with 15-fold replication isn't
particularly helpful(!), and instead suggestions as to how to wring the
maximum information from such narrow datasets are what they need.

Thanks everyone,

David

Professor David Kipling
Department of Pathology
School of Medicine
Cardiff University
Heath Park
Cardiff CF14 4XN

Tel:    +44 29 2074 4847
Fax:    +44 29 2074 4276
Email:  KiplingD at cardiff.ac.uk