[R] Random sampling while keeping distribution of nearest ne

Emmanuel Levy emmanuel.levy at gmail.com
Thu Aug 13 01:15:26 CEST 2009


Thanks for your suggestion Ted,

This would indeed work for the particular example I gave, but I am
looking for a general solution.

For example, if my values are: V=c(2,4,5,6)
Then there would be two possibilities: 2,4,5,6 or 4,5,6,8
more generally, what I mean is that the matrix of distances between
pairs of values in V should be similar in the vector of random values.

Note that in practice, N is around 7,000,000 and X=length(V) may vary
between 20,000 and 500,000.

It'd be great if you could point me out to the name of this class of
problem, to a book, or to a package that could help me solve it.

Many thanks!

Emmanuel


PS: I apologize that I sent a second post. This one did not appear in
my "R-help" label so I assumed it wasn't sent for some reason.





2009/8/12 Ted Harding <Ted.Harding at manchester.ac.uk>:
> On 12-Aug-09 22:05:24, Emmanuel Levy wrote:
>> Dear All,
>> I cannot find a solution to the following problem although I imagine
>> that it is a classic, hence my email.
>>
>> I have a vector V of X values comprised between 1 and N.
>>
>> I would like to get random samples of X values also comprised between
>> 1 and N, but the important point is:
>> * I would like to keep the same distribution of distances between the X
>> values *
>>
>> For example let's say N=10 and I have V = c(3,4,5,6)
>> then the random values could be 1,2,3,4 or 2,3,4,5 or 3,4,5,6, or
>> 4,5,6,7 etc..
>> so that the distribution of distances (3 <-> 4, 3 <->5, 3 <-> 6, 4 <->
>> 5, 4 <-> 6 etc ...) is kept constant.
>>
>> I couldn't find a package that help me with this, but it looks like it
>> should be a classic problem so there should be something!
>>
>> Many thanks in advance for any help or hint you could provide,
>> All the best,
>> Emmanuel
>
> If I've understood you right, you are basically putting a sequence
> with given spacings in a random position amongst the available
> positions. In your example, you would randomly choose between
> 1,2,3,4/2,3,4,5/3,4,5,6/4,5,6,7/5,6,7,8/6,7,8,9/7,8,9,10/
>
> Hence a result Y could be:
>
>  A <- min(V)
>  L <- max(V) - A + 1
>  M <- (0:(N-L))
>  Y <- 1 + (V-A) + sample(M,1)
>
> I think this does it!
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 12-Aug-09                                       Time: 23:49:22
> ------------------------------ XFMail ------------------------------
>




More information about the R-help mailing list