[BioC] Analyzing mulitple tissues

Gordon Smyth smyth at wehi.edu.au
Wed Jun 8 04:46:53 CEST 2005


>David Kipling KiplingD at cardiff.ac.uk
>Tue Jun 7 15:09:17 CEST 2005
>
>Dear Naomi, Gordon and Uri,
>
>If I might try to bring together Naomi's comments with those of Gordon and
>see if I have followed this correctly:
>
>Uri's original design is:
>
>Cardiac1, Cardiac2, Cardiac3
>Skeletal1, Skeletal2
>MSC
>
>That is, a 3x2x1 6-chip experiment.
>
>Naomi commented that with no replication (i.e. the single MSC chip) one
>cannot judge biological variation and the best thing to do is a simple
>fold-change:  "...there is no statistically valid means of analyzing your
>data that improves on an arbitrary choice of 'fold difference', such as
>2-fold difference." {Naomi}

There isn't any conflict between Naomi's comments and my own. Naomi 
actually refered to "biological replication" rather than to replication per 
se. She was reacting to Uri's original post which made it very unclear 
whether there is any biological replication in his experiment at all, i.e., 
it may be that Cardiac1, Cardiac2 etc are not in fact biological 
replicates. Replication is a subtle business, and Uri would need to 
describe his process and population in much more detail than he done for 
more to be said. I may be wrong, but I doubt that Naomi was especially 
concerned about the single MSC chip.

On the other hand, my comments were addressed at your mock experiment and 
were made on the basis that all replication for states 1 and 2 is true 
biological replication.

>Then Gordon replied:
>
> >> Out of curiosity, what is limma doing here and how should one interpret
> >> these t stats/p-values (if indeed one should!)?  Are they any use over
> >> simple M values?
> >
> > Yes, they are almost always better than simply using fold changes.  Using
> > M-values alone would
> > make no use of replication while the t-statistics make use of whatever
> > replication is available.
> > Put very simply, some replication is better than none.
> >
> > You seem to be concerned in your mock experiment that one of the states 
> has no
> > replication.  The
> > limma analysis estimates the variance for each gene from the replicates
> > available for states 1 and
> > 2 and applies that estimate to state 3 as well.  This analysis is perfectly
> > valid provided that
> > the variability of the expression values is similar in state 3 to that in
> > states 1 and 2.
> >
> > Even when the variability is different in state 3, the limma analysis still
> > gives a better ranking
> > than fold change, even for comparisons involving state 3, in most 
> cases.  The
> > basic assumption is
> > that, across genes, the variance in state 3 is positively associated 
> with the
> > variance in states 1
> > and 2.  This is a very weak assumption which is almost always true in
> > practice, as genewise
> > differences in variability tend to dominate state-wise differences.

All my comments below are made on the basis that all replication for states 
1 and 2 is biological replication.

>If I follow Gordon correctly, his argument is that in an experimental design
>like this you can make an estimate for the variance of a probeset based on
>its behaviour in the other samples (with some opportunity for discussion as
>to how valid an assumption this is!).   This results in a situation where
>not all fold changes are equal, and this will actually work better than a
>simple FC estimate for ranking the genes for further exploration.

Yes.

>In other words, in the 3x2x1 design such as this you could get two probesets
>that had identical M values (calculated between the triplicate and single
>chips) *but* limma would rank the probeset with the higher overall
>variability across the six chips lower down the list (seen as a different
>p-value/t statistic).

Exactly.

>So Uri could use limma to study this 3x2x1 design and be able to extract
>potentially differentially regulated genes between the (single) MSC sample
>and either/both of the other two sample classes using the limma p-values
>returned, and this would be a more powerful approach than simple
>fold-changes - yes?

His design is actually 3+2+1 rather than 3x2x1. If his Cardiac and Skeletal 
samples are biological replicates, then yes. If not, see Naomi's comments.

>This is a very interesting point for those of us in core facilities having
>to help users who insist - for reasons of finances, scarce samples, or the
>fact they the experiments are of a preliminary grant-generating nature - on
>doing small-scale experiments where some samples have no replication at all.
>Me telling them to go away and come back with 15-fold replication isn't
>particularly helpful(!), and instead suggestions as to how to wring the
>maximum information from such narrow datasets are what they need.

Making the best use of small-scale experiments is the primary purpose of 
the limma software.

In general, you can still do an analysis with only 1 chip for one of the 
groups, unless you have a strong reason to think that the variability of 
expression will be quite different in that group to the others. Generally 
speaking, the process will work best when the different groups (e.g., 
tissue types) are as similar as possible.

Gordon

>Thanks everyone,
>
>David
>
>
>
>
>Professor David Kipling
>Department of Pathology
>School of Medicine
>Cardiff University
>Heath Park
>Cardiff CF14 4XN
>
>Tel:    +44 29 2074 4847
>Fax:    +44 29 2074 4276
>Email:  KiplingD at cardiff.ac.uk



More information about the Bioconductor mailing list