[BioC] Westfall and Young "maxT"

Wed Jul 9 09:34:41 MEST 2003

Hello,

> I've got a question regarding the Westfall and Young "maxT" procedure
> (implemented in Bioconductor package multtest, function mt.maxT).
>
> If one calculates a two sample T-statistic assuming unequal variances
> for the groups, then the resultant statistic is only approximately T
> and the degrees of freedom are a function of the sample sizes and
> variances.  So the situation is that the distributions of the T
> statistics calculated for different "genes" are in general *not*
> identical.  Obviously, if one has a moderately large sample size
> the reference distributions for the different "genes" are all
> approximately normal and the difference between distributions
> is not anything to worry about.  However, if one's sample sizes are
> smallish, then this could be a problem, correct?

You are right, the test statistics often have different distributions for
different genes.

> So my questions are:
>
> (1) is there anything that can be done to adjust for the differences
>     between the distributions of the genes (I'm guessing there isn't)?

You could use the step-down minP procedure, which first calculates
unadjusted p-values for each gene and then computes adjusted p-values based
on successive minima of these unadjusted p-values.

> (2) if there is, does the function mt.maxT() in package multtest implement
>     such a adjustment

The mt.minP function.

> (3) if there is not such an adjustment, is it still reasonable to apply this
>     procedure to smallish samples and, if yes, is there any *real* justification
>     for doing so.

The maxT procedure still provides control of the FWER when the test
statistics have different distributions. The main issues in choosing
between the maxT and minP procedures are: balance, power, and
computational feasibility.
By balance I mean that the maxT procedure may give different weights to
different hypotheses, while the minP procedure puts the different
hypotheses on the same footing by the p-value transformation.
In terms of computation, the maxT procedure is simpler. Some of these
issues are dicussed in greater detail in two recent papers which you can
download from my website.

Y. Ge, S. Dudoit, and T. P. Speed (2003).
S. Dudoit, J. P. Shaffer, and J. C. Boldrick (2003).

I hope this helps.

Best regards,
Sandrine

-------------------------------------------------------------------------------
Sandrine Dudoit, Ph.D.               E-mail: sandrine at stat.berkeley.edu
Assistant Professor                  Tel: (510) 643-1108
Division of Biostatistics            Fax: (510) 643-5163
School of Public Health              http://www.stat.berkeley.edu/~sandrine
University of California, Berkeley
140 Earl Warren Hall, #7360
Berkeley, CA 94720-7360