[R] P values

Sun May 9 20:20:32 CEST 2010

Time to rescue Random Variables before they drown!

On 09-May-10 16:53:02, Bak Kuss wrote:
> Thank you for your replies.
> 
> As I said (wrote) before, 'I am no statistician'.
> But I think I  know what Random Variables are (not).
> 
> Random variables are not random, neither are they variable.
> [It sounds better in french: Une variable aléatoire n'est pas
> variable, et n'a rien d'aléatoire.]

If that is a quote from somewhere, it would be good to have the
reference! If not, and you made it up (even in nice French),
then please read on.

> See this definition from: 'Introduction to the mathematical
> and statistical foundations of econometrics',
> Herman J. Bierens, Cambridge University Press, 2004, page 21.
> 
> http://docs.google.com/View?id=dct7h449_8748tjc6g9

That definition spells it out in proper detail (though the
symbolism needs explaining).

(Omega, F, P) is a **probability space**: Omega is a set;
F is a family of "Borel subsets" (i.e. adequate to support
a measure on Omega) of Omega; P is a probability function
on Omega, i.e. for every Borel subset B of Omega, P(B) is
defined and obeys the laws of probability with respect to F:
P(Omega) = 1, P(EmptySet) = 0, if B1, B2 are disjoint then
P(B1 union B2) = P(B1) + P(B2) etc.

NOW: A random variable X is a mapping from Omega into (say)
R (the real line) **which carries the probability structure
with it**: For every Borel set B' of R, the inverse mapping
of B', = the set {omega in F such that X(omega) is in B'},
is a Borel set B in F and so has a probability P(B).

Then (though not explicitly stated in that Definition 1.8)
the probability of B' is defined to be the probability of B:

  P(B', a Borel subset of R)
  = P(B, the Borel subset of F which maps onto B')

THAT is the definition of a Random Variable. It is a "variable"
in exactly the same sense that a real number "x" can be called
a variable ("let the variable x assume values between 0 and 1", say);
and it is "random" because the underlying omega in Omega is
random (because F has been furnished with the probability
distribution P defined on its Borel subsets). This mathematical
definition of randomness is, in effect, a mathematical model
of real raondomness.

> Simply put: A random variable is just a mapping (transformation)
> from a set to the real line.

Not "just a transformation". As should be visible in the above,
it is *more* than "just a transformation" -- it has to be able
to carry the probabilities from the underlying space with it.
And it carries them.

> If the mapping is limited to be from a set to the  [0,1] segment
> of the real line, one calls it a probability. But it is still not
> variable nor random. Just a 'simple transformation'.

See above. And NOTE that the "probability" is in the first place
attributed to the underlying space Omega on which the Random
Variable (mapping) X is defined.

> As far as Central Limit Theorems are concerned, are they not...
> well, far from reality.
> They belong to asymptotics. By definition 'asymptotics' do not
> belong to reality.  'As if...' kind of arguments they are.
> Are they not excuses for our 'misbehavior'? An alibi?
> Just like 'p-values'? They just _indicate_  that  _probably_
> we were wrong in having thought such and such...
> Without ever getting close to whatever the 'real reality' was,
> is, could be... probably!
> 
> bak

That is a whole other discussion! I would for now simply dispute
your assertion "By definition 'asymptotics' do not belong to reality".
"Asymptotic" results are mathematical limits of results for finite
sizes of things, as the size gets arbitrarily large (or small).
Admittedly, some numbers are so large that there is nothong that
big in the known Universe, and some so small that there is nothing
that small which is observable. Nevertheless, such mathematical
limits are approached with arbitrary closeness for sufficiently
large (or sufficiently small) values of the limiting variable.

They can therefore serve as **adequate (and mathematically
convenient) approximations** for real-life things; the question
(which is always implicit in using things like the Central Limit
Theorem) is whether the case we have in hand is "sufficiently
large" for the approximation to be adequate (for whatever purpose
we have in hand). And that *is* reality.

Enough for now ...
Ted.

> That's a common misconception. A p-value expresses no more than the
> chance
> of obtaining the dataset you observe, given that your null hypothesis
> _and
> your assumptions_ are true. Essentially, a p-value is as "real" as your
> assumptions. In that way I can understand what Robert wants to say. But
> with
> lare enough datasets, bootstrapping or permutation tests gives often
> about
> the same p-value as the asymptotic approximation. *At that moment, the
> central limit theorem comes into play*
> 
> On Sat, May 8, 2010 at 9:38 PM, Duncan Murdoch
> <murdoch.duncan at gmail.com>wrote:
> 
>> On 08/05/2010 9:14 PM, Joris Meys wrote:
>>
>>> On Sat, May 8, 2010 at 7:02 PM, Bak Kuss <bakkuss at gmail.com> wrote:
>>>
>>>
>>>
>>>> Just wondering.
>>>>
>>>> The smallest the p-value, the closer  to 'reality'  (the more
>>>> accurate)
>>>> the model is supposed to (not) be (?).
>>>>
>>>> How realistic is it to be that (un-) real?
>>>>
>>>>
>>>>
>>>
>>> That's a common misconception. A p-value expresses no more than the
>>> chance
>>> of obtaining the dataset you observe, given that your null hypothesis
>>> _and
>>> your assumptions_ are true.
>>>
>>
>>
>> I'd say it expresses even less than that.  A p-value is simply a
>> transformation of the test statistic to a standard scale.  In the
>> nicer
>> situations, if the null hypothesis is true, it'll have a uniform
>> distribution on [0,1].  If H0 is false but the truth lies in the
>> direction
>> of the alternative hypothesis, the p-value should have a distribution
>> that
>> usually gives smaller values.  So an unusually small value is a sign
>> that H0
>> is false:  you don't see values like 1e-6 from a U(0,1) distribution
>> very
>> often, but that could be a common outcome under the alternative
>> hypothesis.
>>   (The not so nice situations make things a bit more complicated,
>>   because
>> the p-value might have a discrete distribution, or a distribution that
>> tends
>> towards large values, or the U(0,1) null distribution might be a
>> limiting
>> approximation.)
>> So to answer Bak, the answer is that yes, a well-designed statistic
>> will
>> give p-values that tend to be smaller the further the true model gets
>> from
>> the hypothesized one, i.e. smaller p-values are probably associated
>> with
>> larger departures from the null.  But the p-value is not a good way to
>> estimate that distance.  Use a parameter estimate instead.
>>
>> Duncan Murdoch
>>
>>
>>
>>  Essentially, a p-value is as "real" as your
>>> assumptions. In that way I can understand what Robert wants to say.
>>> But
>>> with
>>> lare enough datasets, bootstrapping or permutation tests gives often
>>> about
>>> the same p-value as the asymptotic approximation. At that moment, the
>>> central limit theorem comes into play, which says that when the
>>> sample
>>> size
>>> is big enough, the mean is -close to- normally distributed. In those
>>> cases,
>>> the test statistic also follows the proposed distribution and your
>>> p-value
>>> is closer to "reality". Mind you, the "sample size" for a specific
>>> statistic
>>> is not always merely the number of observations, especially in more
>>> advanced
>>> methods. Plus, violations of other assumptions, like independence of
>>> the
>>> observations, changes the picture again.
>>>
>>> The point is : what is reality? As Duncan said, a small p-value
>>> indicates
>>> that your null hypothesis is not true. That's exactly what you look
>>> for,
>>> because that is the proof the relation in your dataset you're looking
>>> at,
>>> did not emerge merely by chance. You're not out to calculate the
>>> exact
>>> chance. Robert is right, reporting an exact p-value of 1.23 e-7
>>> doesn't
>>> make
>>> sense at all. But the rejection of your null-hypothesis is as real as
>>> life.
>>>
>>> The trick is to test the correct null hypothesis, and that's were it
>>> most
>>> often goes wrong...
>>>
>>> Cheers
>>> Joris
>>>
>>>
>>>
>>>> bak
>>>>
>>>> p.s. I am no statistician
>>>>
>>>>       [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
> 
>       [[alternative HTML version deleted]]
> 

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 09-May-10                                       Time: 19:20:28
------------------------------ XFMail ------------------------------