[BioC] (stupid) question about wilcoxon test and finding interesting genes

Mon Feb 14 08:07:19 CET 2005

yes, you are right, i applied a wilcoxon test to the M values (which is 
in this case the same as the paired wilcox of the log2 expression 
values). i got a vector of p values, one p value for each gene.

the p values i got were a little bit surprising to me, because i found 
genes significant, although they were not that much different between 
the sample and the control group. something about 6000 genes have a p 
value less then 0.05, so this might be ok (i was a little bit too quick 
by saying that every gene is significantly different :) ).

so the next step is to correct the p values... i thought correcting p 
values is only necessary when i do multiple testing? sorry for my 
question, but i am more used to do some programming and work with 
databases then doing statistics...

thanks to all your answers, you help me very much! thanks!

Quoting Naomi Altman <naomi at stat.psu.edu>:

> If I understand what you did, you should have only 1 column of 
> p-values - 1 per gene.  So, I think your apply command did not work 
> as you expected (although I think it should have).
>
> My understanding is that you have 2 arrays per patient and took the 
> 13 M values.  Applying a Wilcoxon test to each row should test that 
> the median difference is 0.
>
> Try doing the test on a couple of rows and then compare with the 
> output you obtained.
>
> After you get 1 p-value per gene, you should apply a multiple 
> comparisons adjustment.  FDR is popular and can be computed using the 
> "qvalue" library in Bioconductor.
>
> --Naomi
>
> At 09:44 AM 2/11/2005, Dipl.-Ing. Johannes Rainer wrote:
>> hi,
>> i must excuse myself for my question, but i'm not really good in 
>> statistics...
>>
>> we have done affymetrix genechips with samples from patients before 
>> and after treatment. until now i searched for genes that are 
>> influenced by the treatment using M values but i wanted also to 
>> apply a statistical test to get some proof that the genes i found 
>> are significant.
>>
>> so i applied a wilcoxon paired test to the expression values (one 
>> test per gene). my samples size is 13 (13 chips with samples before 
>> treatment and 13 afterwards). i subtracted the values after 
>> treatment from those before treatment (
>>
>> p.vals <- apply((untreated-treated),MARGIN=1,wilcox.test) , 
>> untreated is a matrix with 13 columns and 54000 rows (genes) and the 
>> same is treated). according to the p values i got nearly every gene 
>> is significant, also if the gene is not regulated.
>>
>> so my question, do i have to correct the p values or was i totally 
>> wrong with the assumption to get significant (and regulated) genes 
>> in this way?
>>
>> thanks
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
> Naomi S. Altman                                814-865-3791 (voice)
> Associate Professor
> Bioinformatics Consulting Center
> Dept. of Statistics                              814-863-7114 (fax)
> Penn State University                         814-865-1348 (Statistics)
> University Park, PA 16802-2111
>
>