[R] Choice of statistical test (in R) of two apparently different distributions

Ted Harding Ted.Harding at wlandres.net
Thu May 9 10:35:08 CEST 2013


On 09-May-2013 01:42:07 Pascal Oettli wrote:
> On 05/09/2013 10:29 AM, Gundala Viswanath wrote:
>> I have the following list of data each has 10 samples.
>> The values indicate binding strength of a particular molecule.
>>
>> What I want so show is that 'x' is statistically different from
>> 'y', 'z' and 'w'.  Which it does if you look at X it has
>> more values greater than zero (2.8,1.00,5.4, etc) than others.
>>
>> I tried t-test, but all of them shows insignificant difference
>> with high P-value.
>>
>> What's the appropriate test for that?
>>
>> Below is my code:
>>
>> x   <-
>> c(2.852672123,0.076840264,1.009542943,0.430716968,5.4016,0.084281843,0.065654
>> 548,0.971907344,3.325405405,0.606504718)
>> y   <-
>> c(0.122615039,0.844203734,0.002128992,0.628740077,0.87752229,0.888600425,0.72
>> 8667099,0.000375047,0.911153571,0.553786408);
>> z   <-
>> c(0.766445916,0.726801899,0.389718652,0.978733927,0.405585807,0.408554832,0.7
>> 99010791,0.737676439,0.433279599,0.947906524)
>> w   <-
>> c(0.000124984,1.486637663,0.979713013,0.917105894,0.660855127,0.338574774,0.2
>> 11689885,0.434050179,0.955522972,0.014195184)
>>
>> t.test(x,y)
>> t.test(x,z)
>>
>> --END--
>>
>> G.V.
> 
> Hello,
> 
> 1) Why 'x' should be statistically different from others?
> 2) 'y' looks to be bimodal. The mean is not an appropriate measurement 
> for this kind of distribution.
> 
> Regards,
> Pascal

Running the commands:

  plot(x,pch="+",col="red",ylim=c(0,6))
  points(y,pch="+",col="green")
  points(z,pch="+",col="blue")
  points(w,pch="+",col="black")
  lines(x,col="red")
  lines(y,col="green")
  lines(z,col="blue")
  lines(w,col="black")

indicates that y, z and w are similar to each other (with some
suggestion of a serial structure).

However, while part of x is also similar to y, z and w, it is
clear that 3 values of x are "outliers" (well above the range
of all other values, including those of x). [And I think Pascal
meant "x" when he wrote "'y' looks to be bimodal."]

And it may be of interest that these exceptional values of x
occur at x[1], x[5], x[9] (i.e. every 4th observation).

Taken together, these facts suggest that an examination of the
procedure giving rise to the data may be relevant. As one
example of the sort of thing to look for: were the 3 outlying
observations obtained by the same worker/laboratory/apparatus
as the others (or a similar question for x as opposed to y, z, w,
raising issues of reliability). There are many similar questions
one could think of raising, but knowledge of the background
is essential for appropriate choice!

I would agree with Pascal that a "routine" t-test is not
appropriate.

One thing that can be directly looked at statistically
is, taking as given that there are 3 outliers somewhere
in all 40 data, what is the probability that all three
occur in one of the 4 groups (x,y,z,w) of data?

This is 4 times the probability that they occur is a specific
group (say x). The chance of all 3 being in x is the number
of ways of choosing the remaining 7 out of the remaining 37,
divided by the number of ways of choosing any 10 out of 40,
i.e. (in R-speak)

  choose(37,7)/choose(40,10)
  # [1] 0.01214575

so the chance of all 3 being in some one of the 4 groups is

  4*choose(37,7)/choose(40,10)
  # [1] 0.048583

which, if you are addicted to P-values, is just significant
at the 5% (P <= 0.05) level. So this gives some indication
that the "x" group of data is not on the same footing as the
other ("y", "z", "w") groups. However, such a test does not
address any question of why such outliers should be there
in the first place; this needs to be addressed differently
(see above).

And one must not forget that the above "P-value" has been
obtained by a method which was prompted by looking at the data
in the first place.

Hoping this helps,
Ted.

-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 09-May-2013  Time: 09:35:05
This message was sent by XFMail



More information about the R-help mailing list