[R] P values

Fri May 7 19:08:23 CEST 2010

Please let me quote an eminently sensible person, who observed that ...

"p-values are dangerous, especially large, small, and in-between ones." 
- Frank E Harrell Jr., Prof. of Biostatistics and Department Chair,
Vanderbilt University

Charles Annis, P.E.

Charles.Annis at StatisticalEngineering.com
561-352-9699
http://www.StatisticalEngineering.com

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Robert A LaBudde
Sent: Friday, May 07, 2010 12:29 PM
To: Duncan Murdoch
Cc: r-help at r-project.org; level
Subject: Re: [R] P values

At 07:10 AM 5/7/2010, Duncan Murdoch wrote:
>Robert A LaBudde wrote:
>>At 01:40 PM 5/6/2010, Joris Meys wrote:
>>
>>>On Thu, May 6, 2010 at 6:09 PM, Greg Snow <Greg.Snow at imail.org> wrote:
>>>
>>>
>>>>Because if you use the sample standard deviation then it is a t test not
a
>>>>z test.
>>>>
>>>>
>>>I'm doubting that seriously...
>>>
>>>You calculate normalized Z-values by substracting the sample mean and
>>>dividing by the sample sd. So Thomas is correct. It becomes a Z-test
since
>>>you compare these normalized Z-values with the Z distribution, instead of
>>>the (more appropriate) T-distribution. The T-distribution is essentially
a
>>>Z-distribution that is corrected for the finite sample size. In
Asymptopia,
>>>the Z and T distribution are identical.
>>>
>>
>>And it is only in Utopia that any P-value less than 0.01 actually 
>>corresponds to reality.
>>
>>
>I'm not sure what you mean by this.  P-values are simply statistics 
>calculated from the data; why wouldn't they be real if they are small?

Do you truly believe an actual real-life distribution accurately is 
fit by a normal distribution at quantiles of 0.001, 0.0001 or beyond?

"The map is not the territory", and just because you can calculate 
something from a model doesn't mean it's true.

The real world is composed of mixture distributions, not pure ones.

The P-value may be real, but its reality is subordinate to the 
distributional assumption involved, which always fails at some level. 
I'm simply asserting that level is in the tails at probabilities of 
0.01 or less.

Statisticians, even eminent ones such as yourself and lesser lights 
such as myself, frequently fail to keep this in mind. We accept such 
assumptions as "normality", "equal variances", etc., on an 
"eyeballometric" basis, without any quantitative understanding of 
what this means about limitations on inference, including P-values.

Inference in statistics is much cruder and more judgmental than we 
like to portray. We should at least be honest among ourselves about 
the degree to which our hand-waving assumptions work.

I remember at the O. J. Simpson trial, the DNA expert asserted that a 
match would occur only once in 7 billion people. I wondered at the 
time how you could evaluate such an assertion, given there were less 
than 7 billion people on earth at the time.

When I was at a conference on optical disk memories when they were 
being developed, I heard a talk about validating disk specifications 
against production. One statement was that the company would also 
validate the "undetectable error rate" specification of 1 in 10^16 
bits. I amusingly asked how they planned to validate the 
"undetectable" error rate. The response was handwaving and "Just as 
we do everything else". The audience laughed, and the speaker didn't 
seem to know what the joke was.

In both these cases the values were calculable, but that didn't mean 
that they applied to reality.

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.