[R] create stratified splits

Ista Zahn istazahn at gmail.com
Wed Dec 19 23:45:41 CET 2012


Hi Martin,

Interesting question. This is not efficient, but I thought I would
post a brute force method that might be good enough. Surely someone
will have a better approach... Well we'll see. Here is a dumb,
inefficient (but workable) way:

# create the vector to be split
r <- runif(100)

# write a function to split it, with various knobs and toggles
splitSimilar <- function(x, n, mean.tol=.1, sd.tol=.1, itr=500, verbose=FALSE) {
  M <- mean.tol+1
  SD <- sd.tol+1
  I <- 0
# as long as the sd of the means and standard deviations are greater
than tolerance...
  while((M > mean.tol | SD > sd.tol) & I <= itr) {
    I <- I + 1
    ## pick another split
    x1 <- data.frame(g = rep(letters[1:n], length(x)/n),
                     value = sample(x, length(x)))
    M <- sd(tapply(x1$value, x1$g, FUN=mean))
    SD <- sd(tapply(x1$value, x1$g, FUN=sd))
    if(verbose) {
      cat("M = ", M, ", mean.tol =", mean.tol, ": SD = ", SD, ",
sd.tol=", sd.tol, "\n")
    }
  }
# don't try forever...
  if(I >= itr) {
    stop("failed to find split matching criteria: try increasing tolerance")
  } else {
    return(x1)
  }
}

# now use our function to find a set of splits within our mean and sd
tolerance.
tst <- splitSimilar(r, 10, mean.tol = 0.05, sd.tol = 0.1)

# adjust some of the dials and switches to suit...
tst <- splitSimilar(r, 10, mean.tol = 0.03, sd.tol = 0.05, itr=5000)

Best,
Ista

On Wed, Dec 19, 2012 at 3:23 PM, Martin Batholdy
<batholdy at googlemail.com> wrote:
> Hi,
>
>
> I have a vector like:
>
> r <- runif(100)
>
> Now I would like to split r into 10 pieces (each with 10 elements) –
> but the 'pieces' should be roughly similar with regard to mean and sd.
>
> what is an efficient way to do this in R?
>
>
> thanks!
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list