[BioC] ttest or fold change

Crispin Miller CMiller at PICR.man.ac.uk
Tue Dec 16 12:38:38 MET 2003

I was wondering what people made of the following, hypothetical,
experiment - I think it raises some interesting issues about where
arrays sit in the context of a complete experiment at the bench:

The old hgu95 Affy chips had  about 12k probesets on it, the u133a chips
have about 22k probesets and the new plus2 chips have about 56k
probesets. When I did my first experiment on u95 arrays I did multiple
testing correction and found one gene that was particularly interesting
- with a corrected p-score of 0.01. Then, I repeated the experiment on
u133 arrays and found the same gene, but because there were nearly twice
as many probesets on the array, the chances of false positives nearly
doubled (ish), so my p-score dropped. Now, on the plus2 arrays, my gene
has a p-score of >0.05 and it's not significant anymore. What troubles
me is that the gene I chose to work on 6-months ago with my
collaborators now falls through the filter I've set. I guess this means
that I can't say that it isn't just a false positive, and so I need to
do some follow-up to confirm it by other means. The trouble is that if I
did that with a Northern, or real-time PCR, or even a Southern,
shouldn't I be applying multiple testing corrections to these, based on
the other Northerns or Southerns, I could have run, in parallel? 

-----Original Message-----
From: Stephen P. Baker [mailto:stephen.baker at umassmed.edu] 
Sent: 16 December 2003 11:21
To: michael watson (IAH-C); bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] ttest or fold change

RE: [BioC] ttest or fold changeOf course investigators don't want false
negatives as well as false positives but you can promise neither no
false positives nor false negatives except for the trivial case when one
either classifies 100% as positive or negative.  The best you can do is
to quantify the probabilities, then trade off one for the other, i.e.
decreasing the probability of one type of error increases the
probability of the other. However, as there is an arbitrary but real
defacto standard for type  I error of  5% which limits how much the
tradeoff can be manipulated.  New approaches such as mixture models
offer some promise of improvement and use of the False Discovery Rate
can make a very big difference in the number of regulated genes
detected. I think this is underutilized.

Fortunately the REAL answer is under the control of the investigator
with the help of the statistician.  That is the process of power
analysis, i.e. the statistician can help the investigator calculate the
number of microarrays that are needed to provide a desired probability
of detecting a specified size effect (in fold changes).  There is no
effect that is too subtle to detect with enough data.

Of course, there is no such thing as a free lunch and microarrays are
still expensive (but getting cheaper),  but then again if one thinks of
the cost of any other technology, microarrays are incredibly inexpensive
considering the amount of data they produce.  Imagine the cost in
materials and labor to do PCR on 10,000 or 20,000 genes!  

The studies we are seeing are getting larger and larger.  Funding
agencies are funding well prepared proposals for large studies with many
microarrays (i.e. enough to detect meaningful effects) based on small
studies of a few microarrays.  These small studies are then pilot
studies and pilot studies do not need to be "definitive" to be useful.
They just may not be publishable on their own.  

-.- -.. .---- .--. ..-.
Stephen P. Baker, MScPH , PhD(ABD)                      (508) 856-2625
Senior Biostatistician
(775) 254-4885 fax
Academic Computing Services
Lecturer in Biostatistics , Graduate School of Biomedical Sciences
University of Massachusetts Medical School
55 Lake Avenue North                          stephen.baker at umassmed.edu
Worcester, MA 01655  USA

  ----- Original Message ----- 
  From: michael watson (IAH-C) 
  To: Baker, Stephen ; bioconductor at stat.math.ethz.ch 
  Sent: Tuesday, December 16, 2003 4:46 AM
  Subject: RE: [BioC] ttest or fold change

  >This seems small but with a microarray with thousands of genes, this 
  >easily produces a bunch of false positives. I looked at 10 chips from
  A truly excellent reply, and one which I will no doubt refer to
frequently; I am still 
  very much a novice statistician.  However, and please correct me if I
am wrong, but 
  I presume that some scientists are equally afraid of false negatives
as false positives? 
  i.e. that if we are so conservative such that we try to ENSURE that
there are NO 
  false positives, we may throw away genes as not differentially
expressed when in 
  reality they are?  It will be interesting to have a discussion on this
- is it possible, 
  using statistics, to guarentee both no false positives and no false
negatives?  If not, 
  then surely the investigator must decide which is relevant to the
study in question before 
  going on to decide which stats to use. 

	[[alternative HTML version deleted]]

Bioconductor mailing list
Bioconductor at stat.math.ethz.ch

This email is confidential and intended solely for the use o...{{dropped}}

More information about the Bioconductor mailing list