[R] Stats Question: Single data item versus Sample from Norm

Tue Apr 5 12:35:01 CEST 2005

On 05-Apr-05 Ross Clement wrote:
> Hi. I have a question that I have asked in other stat forums
> but do not yet have an answer for. I would like to know if
> there is some way in R or otherwise of performing the following
> hypothesis test.
> 
> I have a single data item x. The null hypothesis is that x
> was selected from a normal distribution N(mu,sigma). The
> alternate hypothesis is that x does not come from this
> distribution.
> 
> However, I do not know the values of mu and sigma. I have a
> sample of size N from which I can estimate mu and sigma.
> So, say that I have N(m,s,N), and x. I would like to say with
> some certainty (e.g. 95%) that I can, or can't reject the
> hypothesis that x came from N(mu,sigma). I would also like a
> power test to say how large N should be given the degree of
> accuracy I need when accepting or rejecting individual x
> values.
> 
> What is the name of the hypothesis test I need for this?
> Is it built into R, or are there packages I could use?

There is no name because there is no unique test.

The difficulty lies in your statement of alternative hypothesis:
"that x does not come from this distribution."

This allows any distribution whatever to be a possible source
of your single observation x. Therefore, whatever the value
of x, you can reject the null hypothesis that it comes from
any N(mu,sigma^2) that is remotely compatible with your N data,
in favour of some distribution that happens to predict with
near-certainty that you will get that particular observation x.

On that basis, for instance, suppose you had m=1.1 and s=2.5
say. And suppose x=1.15 which is very close to m with a
difference which is much smaller than s. You are still
entitled to reject H0 on the basis that your alternative
allows you to postulate N(1.15,0.00000001) as the source
of the observation x.

What you need to do is to make clear what feature of the
value of x, in relation to any given Normal distribution,
would constitute an indication that it was not sampled
from that distribution.

If (as I surmise) this is simply "distance from mu" [the
true mean of the Normal distribution], so that you are
basically testing whether x is an "outlier", then you
could use the simple fact that the distribution of

   ((x - m)(N/(N+1))^0.5)/s

has a t distribution with (N-1) degrees of freedom.

This, if you have to give it a name, would be a "t" test
since that is all it depends on.

Note, however, that this pre-supposes that the variance
of the distribution from which x was sampled is the
same as the variance of the distribution giving your N
values, and also that both distributions are Normal,
differing therefore only in their means. So this is a
tight restriction of your original universal class of
alternatives.

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 05-Apr-05                                       Time: 11:35:01
------------------------------ XFMail ------------------------------