[R] multiple hypothesis testing

Neil Shephard nshephard at gmail.com
Tue Mar 17 12:47:20 CET 2009




Vijaykumar Muley wrote:
> 
> Dear all,
> 
> Myself Vijaykumar Muley working as senior research fellow. By training I
> am
> a computational biologist with not a strong knowledge of statistics. I
> have
> done some analysis which is explained as follows,
> 
> I have 10340 (X) profiles of binary vectors with same length(N=845), I
> will
> call then "gene profiles"
> for example...
> 
>     v1  v2  v3  v4.....vN
> a  1   0    1   0      1
> b  0   0    1   0      0
> c  1   0    1   1      1
> d  0   1    1   1      1
> e  0   0    1   1      1
> .  .   .    .   ........
> .  .   .    .   ........
> .  .   .    .   ........
> upto
> 10340
> 
> 
> then I have some other binary profiles with same length (N=845), here I
> will
> call then "expression profile";
>     v1  v2  v3  v4.....vN
> f1  1   0    1   0      1
> f2  0   0    1   0      0
> f3  1   0    1   1      1
> 
> 
> now I am comparing profile f1 with all X profiles using hypergeometic
> distribution function. What I am getting is p-value(probability) of the
> similarity between profile f1 and all X profiles i.e. 10340 by random
> chance
> alone.
> 
> for example,
> 
> #pair   p-value
> 
> f1,a    1e-20
> f1,b    0.01
> .
> .
> upto
> f1,10340 0.05
> 
> same thing i am doing with f2 and f3.
> 
> if we arrange this data(output) in better readable format, it looks like
> 
>       f1       f2    f3
> a   1e-20    0.01  0.10
> b   0.01     1e-9  0.02
> c   1e-3     0.1   0.30
> d   0.03     0.07  1e-5
> e   1e-1     0.01  1e-9
> .  .   .    .   ........
> .  .   .    .   ........
> .  .   .    .   ........
> upto
> 10340
> 
> 
> I hope everyone understood what type of output I am getting.
> 
> Now I want to perform multiple hypothesis comparision(P-value adjustment)
> on
> this data , so that I will get the statistically significant associations
> between various "expression profiles" and "gene profiles" at specific
> alpha
> level;
> 
> Most conservative method for p-value adjustment is bonferroni and many
> others with less conservation, I dont care which method I use but the
> problem here is
> 
> according to what parameter I should use for correct or adjust p-values ?.
> 
> so in case of Bonferroni correction,
> should I multiply the each p-value with 10340 or
> as I have compared 3 expression profiles against 10340 gene profiles,
> should
> I multiply p-value with 3*10340
> 
> I am aksing this for understanding. What I want to do is
> 
>>From the above gene, p-value table, I want to calculate the percentage of
> false positive rate at each p-values from 0.0001 to 0.05
> So that I can use a good cutoff as significance level (alpha) to exclude
> the
> gene profiles which are weakly associated with all expression profiles.
> (If I am correct, to do this I need to use other p-value correction
> methods,
> either simulation based, resampling or
> Benjamini and Hochberg (B&H).
> 
> Please can any one suuggests me about p-value adjustment or p-value
> correction, I mean statistically or technically which number should I
> consider for correction, 10340 or 3 * 10340, as I have three features to
> associate with same 10340 gene set. or if I am wrong, can any one tell me
> the protocol which I should refer to get fair number of significant
> associations between genes and expression profiles.
> 
> I am using package "multtest" for p-value adjustment but literally I am
> not
> getting for correction,
> should I give p-values for each expression profile alone or give it all
> p-values ie. 3*10340.
> 
> I have gone through many tutorials and articles for multiple hypothesis
> testing but really couldnt get exactly, what is it.
> 
> Please give me some clues, some of you may be actively working on p-value
> adjustment / multiple hypothesis testing, I expect some suggestions.
> 
> I will be grateful for you kind help.
> 
> sincerely,
> 
> 

Please do NOT reply to a digest when posting to the list, you should start a
new thread (or at the very least delete the digest to which you are replying
from your email).

You may be interested False Discovery Rate (FDR) methods proposed by
Benjamini & Hochberg[1] and various related work/papers/software[2][3]

Neil 


[1] Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a
practical and powerful approach to multiple testing. J. R. Statist Soc B
57:289-300
[2] http://genomics.princeton.edu/storeylab/qvalue/

-- 
View this message in context: http://www.nabble.com/multiple-hypothesis-testing-tp22512331p22557450.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list