[BioC] multiple testing with 54000 genes

Dipl.-Ing. Johannes Rainer johannes.rainer at tugraz.at
Fri Feb 18 08:17:04 CET 2005


jim thank you for your excellent explanation! i will check the limma 
package. otherwise i will reduce the number of genes i include in the 
further analysis by some cut off level (as we are interested in genes 
that show a differencial expression between the 0 and 6 hours sample 
(in most patients) i will restrict to those that have for example an M 
value bigger 0.5 in more than two patients).

thanks, jo

Quoting James MacDonald <jmacdon at med.umich.edu>:

> OK, a bit of background. The idea behind a t-test is quite simple; if
> you take two random samples from the same Normal distribution and
> compare them using the t-test, the t-statistic you generate will follow
> a t distribution. This means that we know what to expect from a t-test
> if there really isn't a difference between the two samples, so if we get
> a t-statistic that is much larger than we expect by chance, we can
> assume that there is a difference in the means of the two populations we
> are comparing. This is called a parametric test because we are using the
> two parameters of the Normal distribution (the mean and variance) to
> compare two sets of data that we are assuming come from a Normal
> distribution.
>
> There are some assumptions we are making here. The main assumption is
> that the two samples come from Normal distributions, and the only
> possible difference between the groups is the mean (we assume that the
> variance is the same). There have been some modifications proposed over
> the years to account for different variances, and it has been shown that
> you don't really need Normally distributed data, but as long as the data
> are 'hump' shaped you should be OK. However, if the underlying
> distribution of the data is seriously non-Normal, then the t-test starts
> to fail.
>
> In this case, the failure is caused because we are no longer using the
> correct null distribution, so we have to use non-parametric methods. One
> such method is the Wilcoxon rank sum (or Mann-Whitney) test, where you
> use the rank of the data rather than the values themselves. This test
> has its own set of assumptions that are actually fairly strict. We can
> also attempt to figure out what the null distribution should look like
> for our data using permutation methods. The problem with permutation
> methods is that the smallest p-value will be equal to 1/number of
> permutations. In your case, there are only 2^12 possible permutations
> (combinations, actually), so the smallest p-value will be 0.0002442...
>
> So, long story short, you will probably get better results using the
> limma package, which uses a conventional null distribution.
>
> HTH,
>
> Jim
>
>
>
>>>> "Dipl.-Ing. Johannes Rainer" <johannes.rainer at tugraz.at> 02/17/05
> 12:03PM >>>
>
> exactly! the smalles p value i get is exactly 0.00024 (to be more
> precise 0.0002442 :) ). i thought 12 samples should be enough to
> calculate p values using permutation. what do you mean with parametric
>
> null? something like t-test? i'm sorry for my questions, but i am not a
>
> statistician (not yet...)
>
> thanks!
>
>
> Quoting "James W. MacDonald" <jmacdon at med.umich.edu>:
>
>> Dipl.-Ing. Johannes Rainer wrote:
>>> hi,
>>>
>>> i wanted to ask if someone has experience in multiple testing with a
>
>>> large number of genes.
>>>
>>> i have in total 24 Affymetrix chips (hgu133plus2), 12 patients, for
>
>>> every patient an 0 hours and 6 hours after treatment sample. i
>>> calculated p values using permutation (mt.maxT function with
>>> test="pairt") and corrected for multiple testing using the Benjamini
>
>>> Hochberg method. the problem is, that with that large number of
>>> tests (54675 genes and therefore 54675 tests) after adjusting the p
>
>>> values no gene shows a "significant" difference.
>>>
>>> i will now reduce the number of genes to test to get to some
> results.
>>> has anyone experienced similar problems?
>>
>> You probably don't have enough samples to use a permuted null
>> distribution. I believe the smallest p-value you can get with a
>> permuted null is going to be ~0.00024, which may not be small enough
>
>> to survive a multiplicity correction with that many genes. I would
>> imagine you would get better results if you used a parametric null
>> (e.g., using the limma package).
>>
>>
>> Best,
>>
>> Jim
>>
>>
>>>
>>> thanks, jo
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>>
>> --
>> James W. MacDonald
>> Affymetrix and cDNA Microarray Core
>> University of Michigan Cancer Center
>> 1500 E. Medical Center Drive
>> 7410 CCGC
>> Ann Arbor MI 48109
>> 734-647-5623
>>
>
>
>
>
>
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should 
> not be used for urgent or sensitive issues.
>



More information about the Bioconductor mailing list