ctest miscellania

Peter Dalgaard BSA p.dalgaard@biostat.ku.dk
09 Jan 1998 19:07:47 +0100


Sorry for leaving this one in my mail box for so long, but - well, I
suppose you know what I mean.

(I'm shifting it over to r-devel, so I'll include all your original
text)  

Kurt Hornik <hornik@ci.tuwien.ac.at> writes:

> Well, ctest is not making progress as quickly as I wanted it ...
> Anyway, here are a few questions/remarks.
> 
> * I am still a bit confused about what binom.test() does.  For which
> test are the p-values computed?  In theory, ``the'' test to use would be
> the optimal unbiased one for one-parameter exponential families, which I
> think is not used ... also, this would be a randomized test, how is the
> p-value for such a test defined?  I should really appreciate someone
> helping me out on this.

I really don't think we should do randomized tests, except possibly as
an option. Different people getting different p-values for the same
data??

If one must do it, I conjecture that one could get a "p-value" by
looking at x + runif(1,-.5,.5) and linearly interpolating between - er,
draw a picture of the density of the modified x and think about it...

> * I would still like to come up with rather general functions for
> 	location.test()
> 	scale.test()
> and perhaps some more ...  Some time ago, Peter and Tony Rossini and I
> had quite a vivid exchange of emails on this, but I seem to have lost
> our final findings (in case we ever got this far).
> 
> Anyway, any input on this would be great.  Remember, the basic idea was
> to have a unified approach to e.g. several non-parametric tests for a
> difference in location or scale (rather than having mood.test,
> ansaribradley.test, vanderwaerden.test, ...).  However, as PD pointed
> out, it would be a bad idea to use this general scheme for tests which
> don't really fit in there (such as the Wilcoxon tests).  In fact, I seem
> to remember that one issue was whether there are any tests which really
> are location or scale tests ...

I think the main point was that they're not *median* tests,
(irrespective of what the SAS output says!) "Location test" is
probably OK. My basic worry was the risk of losing the simplicity of
having well known standard tests called simply t.test(),
wilcoxon.test(), in favour of a perhaps unnecessarily abstract
taxonomy. Of course there's always the possibility to do things like

spearman.test(...) <- function(...) cor.test(..., method="spearman")

etc.

> * Speaking of the Wilcoxon tests, I still need to add exact computations
> for the small-sample cases.  Does anyone have code or algorithms for
> doing that?

Signed rank is trivial, you just generate the 2^k different sign
patterns and look at the distribution of the sums. Even in interpreted
code, this can be done for k up to 16 or so, at which point the
difference from the approximation is immaterial. The bit patterns are
simply all binary numbers between 0 and 2^k-1.

The two-sample case is a bit more unwieldy...

> 
> * I also mentioned some time ago that I'd like to make Fisher's test
> available for tables larger than 2 by 2.  There is an implementation
> (FEXACT) of the Mehta and Patel algorithm available via APSTAT (I
> think).  However, when I last used it (for an association analysis of a
> gene with 12 alleles) it could not deal with the ``large'' 12 by 2
> table.  (More precisely, it can deal with it after enlarging some size
> parameters in the sources and recompiling, but that's not the smart
> wayof doing things ...)  Again, does anyone have a suggestion what to do
> here?  (FEXACT has a ``mixed'' method of dealing with larger tables, but
> it seems stupid to have an R function which may produce a message like
> ``no, I need more memory ... please try to change param XYZ and then
> recompile''.)

If one can precompute the size of the array, one can usually allocate
it in R and pass it as a parameter instead.

As you know, *my* main desires for ctest is to allow model formula
specifications for all of the common tests (for consistency), and in
the slightly longer run also include trend tests and stratification.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._