[R] Categories or clusters for univariate data

Berton Gunter gunter.berton at gene.com
Tue Feb 22 17:41:07 CET 2005


> > bounds for each group.  My question is, is there a function 
> > in R that can do 
> > the same thing for more complex and subtle groupings in 
> > univariate data, and 

>>    ** provide a statistical basis for the result? **

No. Others have suggested useful ways to **generate** reasonable hypotheses
about "subtle groupings" in the data; however, by the nature and logic of
hypothesis testing, one cannot then evaluate the statistical "significance"
of any groupings that one purports to have found.

One **possible** way of overcoming this dilemma is to randomly bifurcate the
data into training and test sets, do ALL model development on the training
set, and then evaluate statistical "significance" (once and only once) on
the test set. However, one may argue that even this blows up type I error,
as the random split likely preserves the same structures in both and thus
doesn't eliminate the large bias of testing models fit to the random
anomalies of the data set at hand.

As A.S.C Ehrenberg argued many years ago -- and recent events on the U.S.
Cox II regulatory stage have dramatized -- single sets of data cannot be
used as the basis for scientific knowledge; multiple sets of data generated
under different conditions and with different sources of exogenous variation
are required.

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box




More information about the R-help mailing list