[BioC] Wilcoxon test [was loged data or not loged previous to use normalize.quantile]

Mon Apr 11 14:21:48 CEST 2005

>Not forgetting that the two-sample t-test performs fine under the same
circumstances (large
>balanced samples), even for non-normal distributions and unequal
variances.
>
>Regards
>Gordon

Does anyone by any chance have a few references for this point,
particularly for non-normal distributions. I've seen references to
monte-carlo simulation studies to look at assumption violations but
being at a biological institute it's difficult to get access to good
statistics texts. All internet searches just mention 'large' and
'balanced' samples. I would be especially interested in 'what if'
situations like you gave for the wilcoxon test.

I have group sizes between 0-30, generally unbalanced to some degree
(mean min/max = 15/25). I know these are not that large (if large at
all). But I'm looking to 'quantify' what problems I may get comparing
sample sizes of say 6, 15, 21, 25, 29. If there are also non-normal
dist, skew and outliers to take into account in some cases.

I'm wondering if I have unbalanced group size (x > y) whether it would
reduce the problems of unbalanced variance to
x1 <- sample(x,y)
then test (x1,y) for a number (10?) of repeats and then take the maximum
p.value
I guess anything with n < 10 would have to be discarded first.

Looking at the data case by case is not possible with >500 compounds and
~20 groups.

Cheers for any info,
Matt