[R] Random Selection of numbers

Thu Jun 2 16:29:22 CEST 2011

On Thu, Jun 02, 2011 at 09:56:51AM +0200, ogbos okike wrote:
> Hello,
> I am attempting to randomly select a data of equal length from my dataset.
> My dataset is of equal length each ranging from 1 to 16 rows. Since they are
> of equal length, I can form a matrix of equal length and rows or concatenate
> them into a data of 16n x 2 matrix where n is number of samples.  I have
> reproduced small part of the data below.
> 
> Now the problem is how to select as many  samples of the same length (i.e.
> 16 rows) as possible from the two dataset below.  If the first is taken as
> X1 and the second as X2, manually selecting from the 4th row of X1 to 3rd
> row of X2 gives a data of length 16, from 5th row of X1 to 4th row of X2
> gives a data of length 16, etc. This implies choosing any row from X1 and
> counting 15 rows down from that to get 16 rows. I can then concatenate these
> new samples to the original sample and sort them out to do my work.
> 
> Doing this random selection manually when my  dataset becomes larger may not
> be good. I will be obliged should anyone suggests how I can do this in R.

Hello.

Let me put your data as an R command for simplicity.

  x <- c(703116, 243714, 297060, 307697, 296588, 255266, 297116, 
  291530, 239259, 239126, 212396, 202471, 227833, 212977, 207408, 
  228564, 230414, 15372, 19647, 29523, 26234, 34766, 16738, 25215, 
  20757, 31250, 27993, 24441, 19853, 20751, 7658, 5934)
  a <- cbind(rep(1:16, times=2), x)

If i is the starting index, is the following, what you ask for?

  i <- 4
  a[i:(i+15), ]

              x
   [1,]  4 307697
   [2,]  5 296588
   [3,]  6 255266
   [4,]  7 297116
   [5,]  8 291530
   [6,]  9 239259
   [7,] 10 239126
   [8,] 11 212396
   [9,] 12 202471
  [10,] 13 227833
  [11,] 14 212977
  [12,] 15 207408
  [13,] 16 228564
  [14,]  1 230414
  [15,]  2  15372
  [16,]  3  19647

The index i may be chosen at random for example as

  i <- sample(1:16, 1)

This allows to get 16 different samples or, perhaps, 17
if we can start at the first row of the second datset.
I am not sure, whether you can consider also other types
of subsets to increase the number of different samples.
For example, the following selects 16 rows at random

  a[sort(sample(1:32, 16)), ]

Hope this helps.

Petr Savicky.