[R] Comparing samples with widely different uncertainties

Wed Aug 25 16:22:27 CEST 2010

On Aug 25, 2010, at 3:57 PM, Sandy Small wrote:

> Hi
> This is probably more of a statistics question than a specific R
> question, although I will be using R and need to know how to solve the
> problem in R.
> 
> I have several sets of data (ejection fraction measurements) taken in
> various ways from the same set of (~400) patients (so it is paired data).
> For each individual measurement I can make an estimate of the percentage
> uncertainty in the measurement.
> Generally the measurements in data set A are higher but they have a
> large uncertainty (~20%) while the measurements in data set Bare lower
> but have a small uncertainty (~4%).
> I believe, from the physiology, that the true value is likely to be
> nearer the value of A than of B.
> I need to show that, despite the uncertainties in the measurements
> (which are not themselves normally distributed), there is (or is not) a
> difference between the two groups, (a straight Wilcoxon signed ranks
> test shows a difference but it cannot include that uncertainty data).
> 
> Can anybody suggest what I should be looking at? Is there a language
> here that I don't know? How do I do it in R?
> Many thanks for your help
> Sandy

Hm, well...

I don't think the issue is entirely well-defined, but let me try and give you some pointers:

For bivariate normal data (X,Y), the situation is that even if V(X) != V(Y) you still end up looking at X-Y if the question is whether the means are the same. It's sort of the only thing you _can_ do...

For non-normal data, it is not clear what the null hypothesis really is. The signed-rank test assumes that X-Y has a symmetric distribution, which is dubious if X is not symmetric and its variation dominates that of Y. You could also do a sign test and see if the differences has a median of zero (bear in mind that the median of a difference is different from the difference of the medians, but it could actually suffice.)

I'd probably start off with a simple plot of Y vs X and look for fan-out effects indicating that the variance depends on the mean. If it does, perhaps take logs or square roots and see if it makes the variances appear more stable and perhaps improve on the normality. Then maybe just do a paired t-test. 

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com