[BioC] help with multiple testing

Wolfgang Huber whuber at embl.de
Mon Jun 25 20:10:58 CEST 2012


Dear Mike

I'd be surprised if this problem were cracked by a brute force purely 
'statistical' approach. You could try to reduce the number of tests by 
first grouping the genes into 'pathways' or functional modules. With a 
lot of luck, the data may then just be large enough.

	Besy wishes
	Wolfgang

Jun/25/12 1:15 PM, efthimiosm scripsit::
> Hi all,
>
> My name is Mike and I am a post-doctoral fellow in Bioinformatics. I
> have a question regarding multiple testing p-values adjustment and I
> wonder if someone could give me a piece of advice.
>
> I have multiple gene pairs (approximately 8,256) composed by all
> possible combinations of 129 genes. For each pair A-B (A different from
> B) four values are recorded: number of tumors found in both A and B
> (TT),  number of tumors only in A (TF), number of tumors only in B (FT),
> number of tumors found neither in A nor in B (FF). The data are in the
> form of 2x2 contingency tables. E.g.
>
> Gene 1    Gene 2    TT    TF    FT    FF
> g1    g2    5    1    1    27
> g1    g3    4    1    1    28
> g2    g3    4    2    0    28
> ...
> ...
> ...
>
> Notice that each gene is paired with all others and thus it is
> represented 128 times in this list. I want to find which of the 8,256
> gene pairs (tests) show significant associations between rows (in A, not
> in A) and columns (in B, not in B) by Fisher or Barnard test.
> Subsequently I have to perform p-value adjustment for multiple testing.
>
> At 5% I find approximately 500 significant gene pairs but, naturally,
> all p-value adjustment procedures I tried (for independent tests: BH,
> q-value; for dependent tests: BY, adaptiveBH and BlaRoq from package
> "multtest") produce adj. p-values > 0.3. I think that the problem is
> that the highly dependent nature of the data (50% of the genes have very
> small number of mutations which gives high p-values for all pair they
> generate) affects dramatically the adjustment procedure.
>
> Is there a better way (method) to run the p-values adjustment?
>
> Do you think if I created multiple lists of gene pairs, where each gene
> is represented only once, and then estimate q-value (multiple q-values
> for each pair) would be an appropriate solution?
>
>
> Thank you,
> Mike
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Best wishes
	Wolfgang

Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioconductor mailing list