[R] how to select an element from a vector based on a probability

Rui Barradas ruipbarradas at sapo.pt
Thu Apr 10 22:34:59 CEST 2014


Hello,

Inline.

Em 10-04-2014 21:04, Nordlund, Dan (DSHS/RDA) escreveu:
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of Simone Gabbriellini
>> Sent: Thursday, April 10, 2014 11:59 AM
>> To: Rui Barradas
>> Cc: r-help at r-project.org
>> Subject: Re: [R] how to select an element from a vector based on a
>> probability
>>
>> Hello, Rui,
>>
>> it does, indeed!
>>
>> thanks,
>> Simone
>>
>> 2014-04-10 20:55 GMT+02:00 Rui Barradas <ruipbarradas at sapo.pt>:
>>> Hello,
>>>
>>> Use ?sample.
>>>
>>> sample(x, 1, prob = x)
>>>
>
> Just be aware that, in using this method, the probability of selection of a particular value will also be a function of how frequent the value is.  For example,
>
> set.seed(7632)
> x <- c(2,2,6,2,1,1,1,3)
> table(sample(x, 10000, prob=x, replace=TRUE))
>
>     1    2    3    6
> 1664 3340 1696 3300
>
>
> The probability that a vector position with a value of 1 will be selected is 1/18 (in this particular example).  However, the probability that a value of 1 will be selected is 1/6 since there are three 1's.  The probability of selecting the position with a value of 3 is 3/18.  But since there is only one position with a value of 3, the probability of getting the value 1 on any given sample is equal to the probability of getting the value 3.

You're right, I didn't notice that. One way of avoiding that problem is 
the following.

prob <- merge(x, data.frame(x=unique(x), 
prob=unique(x)/sum(unique(x))))$prob
sample(x, 1, prob = prob)

Rui Barradas

>
>
>
>
>>> Hope this helps,
>>>
>>> Rui Barradas
>>>
>>> Em 10-04-2014 19:49, Simone Gabbriellini escreveu:
>>>
>>>> Hello List,
>>>>
>>>> I have an array like:
>>>>
>>>> c(4, 3, 5, 4, 2, 2, 2, 4, 2, 6, 6, 7, 5, 5, 5, 10, 10, 11, 10,
>>>> 12, 10, 11, 9, 12, 10, 36, 35, 36, 36, 36, 35, 35, 36, 37, 35,
>>>> 35, 38, 35, 38, 36, 37, 36, 36, 37, 36, 35, 35, 36, 36, 35, 35,
>>>> 36, 35, 38, 35, 35, 35, 36, 35, 35, 35, 6, 5, 8, 6, 6, 7, 1,
>>>> 7, 7, 8, 9, 7, 8, 7, 7, 13, 13, 13, 14, 13, 13, 13, 14, 14, 15,
>>>> 15, 14, 13, 14, 39, 39, 39, 39, 39, 39, 41, 40, 39, 39, 39, 39,
>>>> 40, 39, 39, 41, 41, 40, 39, 40, 41, 40, 41, 40, 40, 40, 39, 41,
>>>> 39, 39, 39, 39, 40, 39, 39, 40, 40, 39, 39, 39, 1, 4, 3, 4)
>>>>
>>>> I would like to pick up an element with a probability proportional
>> to
>>>> the element value, thus higher values should be picked up more often
>>>> than small values (i.e., picking up 38 should be more probable than
>>>> picking up 3)
>>>>
>>>> Do you have any idea on how to code such a rich-get-richer
>> mechanism?
>>>>
>>>> Best regards,
>>>> Simone
>>>>
>>>
>>
>>
>>
>
> Dan
>
> Daniel J. Nordlund, PhD
> Research and Data Analysis Division
> Services & Enterprise Support Administration
> Washington State Department of Social and Health Services
>
>




More information about the R-help mailing list