[R] create stratified splits

David Winsemius dwinsemius at comcast.net
Thu Dec 20 00:38:08 CET 2012

On Dec 19, 2012, at 12:23 PM, Martin Batholdy wrote:

> Hi,
> I have a vector like:
> r <- runif(100)
> Now I would like to split r into 10 pieces (each with 10 elements) –
> but the 'pieces' should be roughly similar with regard to mean and sd.
> what is an efficient way to do this in R?

> m <- sort(runif(100))
> do.call(rbind, split(m, (1:100)%%10 )) 
         [,1]       [,2]      [,3]      [,4]      [,5]      [,6]      [,7]      [,8]      [,9]     [,10]
0 0.073246870 0.17794968 0.2923314 0.4314560 0.4774632 0.6035957 0.7122246 0.7671372 0.8759190 0.9994554
1 0.004766445 0.08639538 0.1922977 0.2976945 0.4327731 0.4966852 0.6094609 0.7124650 0.7771450 0.9009393
2 0.016612211 0.12028226 0.2052309 0.3336055 0.4349006 0.5161239 0.6204279 0.7149662 0.7830977 0.9022377
3 0.027497879 0.12147150 0.2061456 0.3427435 0.4381574 0.5179506 0.6252453 0.7244906 0.8065418 0.9055773
4 0.028392933 0.12856468 0.2086340 0.3482647 0.4420098 0.5308244 0.6348948 0.7271810 0.8202800 0.9072492
5 0.042657119 0.14656184 0.2251334 0.3487408 0.4484275 0.5423360 0.6480134 0.7298033 0.8298771 0.9297432
6 0.045639209 0.15821977 0.2372649 0.3816321 0.4561417 0.5481704 0.6758081 0.7309329 0.8355179 0.9427048
7 0.050771165 0.16489115 0.2625372 0.4225952 0.4701286 0.5512640 0.6765688 0.7508822 0.8510762 0.9444102
8 0.051595323 0.16541512 0.2713721 0.4235584 0.4724879 0.5652690 0.7066615 0.7512220 0.8625107 0.9610963
9 0.057932068 0.17766175 0.2834772 0.4284754 0.4725581 0.5782843 0.7084244 0.7533327 0.8668086 0.9961111

> res <- do.call(rbind, split(m, (1:100)%%10 )) 

Rows could be unsorted via apply(res, 1, sample, 10)

> apply(res, 1, mean)
        0         1         2         3         4         5         6         7         8         9 
0.5410779 0.4510622 0.4647485 0.4715821 0.4776296 0.4891294 0.5012032 0.5145125 0.5231188 0.5323066 
> apply(res, 1, sd)
        0         1         2         3         4         5         6         7         8         9 
0.3046305 0.3031683 0.2957381 0.2978136 0.2992292 0.2988865 0.2987615 0.2967925 0.3019649 0.3047879 
David Winsemius
Alameda, CA, USA

More information about the R-help mailing list