[R] Code find exact distribution for runs test?

Fri Feb 12 00:08:20 CET 2010

OK, I think I have it worked out for both cases:

library(vcd)

druns <- function(x, n) { # x runs in n data points, not vectorized
						# based on sample median

	if( n%%2 ) stop('n must be even')

	if( x %% 2 ) { # odd x
		choose( n/2 - 1, n/2-(x+1)/2 ) * choose( n/2 - 1, n/2-(x-1)/2 )/
				choose(n,n/2) * 2
	} else {		# even x
		choose( n/2 - 1, n/2-x/2 ) * choose( n/2 - 1, n/2-x/2 )/
				choose(n,n/2) * 2
	}
}

## population median
out1 <- replicate( 100000, {x <- rnorm(10); length(rle(sign(x))$lengths) } )

rootogram( table(out1), dbinom(0:9, size=9, prob=0.5)*100000 )
chisq.test( table(out1), p=dbinom(0:9, size=9, prob=0.5) )

## sample median
out2 <- replicate( 100000, {x <- rnorm(10); x <- x - median(x); length(rle(sign(x))$lengths) } )

fit <- sapply( 2:10, druns, n=10 )

rootogram( table(out2), fit * 100000 )
chisq.test( table(out2), p=fit )

The tests only fail to prove me wrong, not a firm proof that I am correct.  But given the simulation size I am at least pretty close.  I can see why people worked out approximations before we had good computers, I would not want to calculate the above by hand.

Enjoy,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Greg Snow
> Sent: Thursday, February 11, 2010 12:19 PM
> To: Dale Steele; R-help at r-project.org
> Subject: Re: [R] Code find exact distribution for runs test?
> 
> I am not an expert in this area, but here are some thoughts that may
> get you started towards an answer.
> 
> First, there are 2 ways (possibly more) that can lead to the data for a
> runs test that lead to different theoretical distributions:
> 
> 1. We have a true or hypothesized value of the median that we
> subtracted from the data, therefore each value has 50% probability of
> being positive/negative and all are independent of each other (assuming
> being exactly equal to the median is impossible or discarded).
> 
> 2. We have subtracted the sample median from each sample value (and
> discarded any 0's) leaving us with exactly half positive and half
> negative and not having independence.
> 
> In case 1, the 1st observation will always start a run.  The second
> observation has a 50% chance of being the same sign (F) or different
> sign (S), with the probability being 0.5 for each new observation and
> them all being independent (under assumption of random selections from
> population with known/hypothesized median) and the number of runs
> equaling the number of S's, this looks like a binomial to me (with some
> '-1's inserted in appropriate places.
> 
> In case 2, this looks like a hypergeometric distribution, there would
> be n!/((n/2)!(n/2)!) possible permutations, just need to compute how
> many of those permutations result in x runs to get the probability.
> There is probably a way to think about this in terms of balls and urns,
> but I have not worked that out yet.
> 
> Hope this helps,
> 
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
> 
> 
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > project.org] On Behalf Of Dale Steele
> > Sent: Wednesday, February 10, 2010 6:16 PM
> > To: R-help at r-project.org
> > Subject: [R] Code find exact distribution for runs test?
> >
> > I've been attempting to understand the one-sample run test for
> > randomness.  I've found run.test{tseries} and run.test{lawstat}.
> Both
> > use a large sample approximation for distribution of the total number
> > of runs in a sample of n1 observations of one type and n2
> observations
> > of another type.
> >
> > I've been unable to find R code to generate the exact distribution
> and
> > would like to see how this could be done (not homework).
> >
> > For example, given the data:
> >
> > dtemp <- c(12, 13, 12, 11, 5, 2, -1, 2, -1, 3, 2, -6, -7, -7, -12, -
> 9,
> > 6, 7, 10, 6, 1, 1, 3, 7, -2, -6, -6, -5, -2, -1)
> >
> > The Monte Carlo permutation approach seems to get me part way.
> >
> >
> > # calculate the number of runs in the data vector
> > nruns <- function(x) {
> >     signs <- sign(x)
> >     runs <- rle(signs)
> >     r <- length(runs$lengths)
> >     return(r)
> > }
> >
> > MC.runs <- function(x, nperm) {
> > RUNS <- numeric(nperm)
> > for (i in  1:nperm) {
> >     RUNS[i] <- nruns(sample(x))
> > }
> >     cdf <- cumsum(table(RUNS))/nperm
> >     return(list(RUNS=RUNS, cdf=cdf, nperm=nperm))
> > }
> >
> > MC.runs(dtemp, 100000)
> >
> > Thanks.  --Dale
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.