[R] Permutations and large data sets
chrisamiller at gmail.com
Wed Nov 12 23:47:25 CET 2008
I have 200 samples, with 1 million data points in each. Each data
point can have a value from zero to 10, and we can assume that they're
normally distributed. If I calculate a sum by drawing one random data
point from each sample and adding them, what value does that sum need
to be before I can say that it's higher than 95% of the other possible
sums (with reasonable probability)?
The brute-force way to do this is to calculate all possible sums, sort
them, then find the value 95% of the way through the list. Obviously,
this won't work, since the number of permutations is astronomical. So
what's the appropriate way to approximate this, using R?
More information about the R-help