[BioC] Harsh results using limma!

David K Pritchard dpritch at u.washington.edu
Fri Aug 13 21:31:22 CEST 2004


I think Mick's experiences point out a fundamental problem with current statistical analysis of microarray data.  If his data was .2, .2, .2,  (dye flips) -.2, -.2, -.2 then Limma would note this gene as highly differentially expressed.  In contrast when he sees 6.29, 5.54, 0.2, (dye flips)-5.27,-4.61,   -0.2 Limma did not mark it as differentially expressed.  
     As a biologist I would argue the case for the genes actually being differentially expressed is much higher in the second case.  Yet using modified T-statistic approaches and with the limited number of repeats common with current array experiments,  I see array experiments "missing" these very interesting high variance genes all the time.  
    Current analytical techniques put a high premium on consistency of results and a lower premium on strength of differential expression which is the parameter that biologists would argue is the most significant.
     There are a variety of biological reasons why high variance genes should exist and personally I think these genes are likely to be the biologically interesting ones that we should be looking for on microarrays.  
     I understand why Limma does what it is does and it is a fantastically useful program.  However, I would suggest to the statisticians reading this message  that it would be very useful to start developing analytical techniques which could better detect high variance genes.

David Pritchard








On Fri, 13 Aug 2004, A.J. Rossini wrote:

> 
> The OTHER explanation could be technician error or inadvertent
> cross-hybridization due to processing; the catch is that you've also
> got a batch effect, it seems.
> 
> The more critical issue is that you've got to use statistics that
> describe what you want.  Obviously, those that standardize location
> (mean/median) by some variability (std error/interquartile range) are
> going to give you p-values (non-sig) which reflect that
> standardization.
> 
> If you want a "consistency" / magnitude statistic (say, a sign test
> augmented in some manner with the magnitude), at this point you'd have
> to be creative.  But having been creative, you still could get a
> distribution via simulation or resampling to work from to obtain
> p-values.
> 
> The only problem will be trying to convince reviewers (or folks
> playing devil's advocate) that your "statistic" is reasonable for
> differential expression.
> 
> best,
> -tony
> 
> 
> 
> 
> "michael watson (IAH-C)" <michael.watson at bbsrc.ac.uk> writes:
> 
> > Hi Gordon
> >
> > Yes you're right.  I didn't really mean to compare limma to a t-test.
> > It's just that the results are very consistent within technical
> > replicates (the dye-swaps), just not consistent between biological
> > replicates.  But this is the situation we expect - technical replicates
> > highly correlated and biological replicates much less so.  Clearly
> > differences of 0.2 could be noise, but my due-swaps BOTH came up with
> > 0.2.  If I had ten replicate dye-swaps, all with 0.2 as the log(ratio)
> > would we still call this noise?   Given that the other replicate
> > experiments were also highly reproducible, I can't help but think this
> > gene is differentially expressed.
> >
> > I know why limma and t-test disregard this gene, I just still think it
> > is a little harsh and that I am "throwing the baby away with the
> > bathwater", as it were.  
> >
> > Mick
> >
> > -----Original Message-----
> > From: Gordon Smyth [mailto:smyth at wehi.edu.au] 
> > Sent: 13 August 2004 12:56
> > To: michael watson (IAH-C)
> > Cc: bioconductor at stat.math.ethz.ch
> > Subject: Re: [BioC] Harsh results using limma!
> >
> >
> > At 09:14 PM 13/08/2004, michael watson (IAH-C) wrote:
> >>Hi
> >>
> >>Firstly, I think limma is excellent and use it a lot, but some recent 
> >>results are a bit, erm, disappointing and I wondered if someone could 
> >>explain them.
> >>
> >>Basic set up was a double dye-swap experiment (4 arrays) involving 
> >>different animals, one infected with one type of bacterium and the 
> >>other a different bacterium, compared to one another directly.  I used 
> >>limma to analyse this and got a list of genes differentially regulated 
> >>- great!
> >>
> >>THEN another replicate experiment was performed (so now I have 6 
> >>arrays, 3 dye-swaps), and I re-did the analysis and my set of genes was
> >
> >>completely different - but that's fine, we can put that down to 
> >>biological variation.  We know limma likes genes which show consistent 
> >>results across arrays, and when I looked at my data, I found that the 
> >>genes in my original list were not consistent across all six arrays.  
> >>So I am reasonably happy about this.
> >>
> >>My question comes from looking at the top gene from my old list in the 
> >>context of all six arrays.  Here are the normalised log ratios across 
> >>all six arrays (ds indicates the dye-swap):
> >>
> >>Gene1
> >>Exp1            -5.27
> >>Exp1ds  6.29
> >>Exp2            -4.61
> >>Exp2ds  5.54
> >>Exp3            -0.2
> >>Exp3ds  0.2
> >
> > Changes of +-0.2 are tiny and look like pure noise. So, you can have a
> > gene 
> > for which only 2/3 of your mice show a difference. Statistical methods 
> > based on means and standard deviations will always judge this situation 
> > harshly. If you try an ordinary t-test rather than the limma method,
> > you'll 
> > find that this gene would be judged much more harshly again.
> >
> > Gordon
> >
> >>Not suprisingly, limma put this as the top gene when looking at the 
> >>first four arrays.  However, when looking across all six arrays, limma 
> >>places it at 230 in the list with a p-value of 0.11 (previously the 
> >>p-value was 0.0004).
> >>
> >>So finally we get to my point/question - does this gene really 
> >>"deserve" a p-value of 0.11 (ie not significant)?  In every case the 
> >>dye-flips are the correct way round, it is only the magnitude of the 
> >>log(ratio) which differs - and as we are talking about BIOLOGICAL 
> >>variation here, don't we expect the magnitude to change?  If we are 
> >>taking into account biological variation, surely we can't realistically
> > expect consistent
> >>ratios across all replicate experiments??   Isn't limma being a little
> >>harsh here?  After all the average log ratio is -3.7 (taking into 
> >>account the dye-flips) - and to me, experiment 3's results still 
> >>support the idea of the gene being differentially expressed, and are 
> >>even consistent within that biological replicate.
> >>
> >>Clearly I am looking at this data from a biologists point of view and 
> >>not a statisticians.  But we are studying biology, not statistics, and 
> >>I can't help feel I am missing out on something important here if I 
> >>disregard this gene as not significantly differentially expressed (NB 
> >>this is just the first example, there are many others).
> >>
> >>I should also add that there appears nothing strange about the arrays 
> >>for Experiment 3 - the distribution of log(ratio) for those arrays is 
> >>pretty much the same as the other four, so this is not an array-effect,
> >
> >>it is an effect due to natural biological variation.
> >>
> >>Comments, questions, criticisms all welcome :-)
> >>
> >>Mick
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
> 
> -- 
> Anthony Rossini			    Research Associate Professor
> rossini at u.washington.edu            http://www.analytics.washington.edu/ 
> Biomedical and Health Informatics   University of Washington
> Biostatistics, SCHARP/HVTN          Fred Hutchinson Cancer Research Center
> UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
> FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email
> 
> CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>



More information about the Bioconductor mailing list