[R] Combining multiple probability weights for the sample() function.

Wed Jun 3 21:28:28 CEST 2015

On 6/3/2015 11:26 AM, Boris Steipe wrote:
> If letters 1 and 2 must be equal with p=0.5, and 1 and 3 must be equal with p=0.5, then letter 1 must be the same as either 2 or 3. Therefore:
>
> Choose a letter.
> Make a pair of (letter, (not letter)).
> Reverse the pair with p = 0.5
> Concatenate your letter and the pair.
>
>
> Is that what you need?
>
>
> B.
>
>
>
> On Jun 2, 2015, at 8:26 AM, Benjamin Ward (ENV) <B.Ward at uea.ac.uk> wrote:
>
>> Dear R-List,
>>
>> I have a set of possibilities I want to sample from:
>>
>> bases <- list(c('A', 'C'), c('A', 'G'), c('C', 'T'))
>> possibilities <- as.matrix(expand.grid(bases))
>>
>>> possibilities
>> Var1 Var2 Var3
>> [1,] "A"  "A"  "C"
>> [2,] "C"  "A"  "C"
>> [3,] "A"  "G"  "C"
>> [4,] "C"  "G"  "C"
>> [5,] "A"  "A"  "T"
>> [6,] "C"  "A"  "T"
>> [7,] "A"  "G"  "T"
>> [8,] "C"  "G"  "T"
>>
>> If I want to randomly sample one of these rows. If I do this, I find that it is 25% likely that my choice will have an identical first and last letter (e.g. [1,] "A"  "A"  "C"). It is also 25% likely that my choice will have an identical first and third letter (e.g. [4,] "C"  "G"  "C"). It is not likely at all that the second and third letter of my choice could be identical.
>>
>> What I would like to do, is sample one of the rows, but given the constraint that the probability of drawing identical letters 1 and 2 should be 50% or 0.5, and at the same time the probability of drawing identical letters 1 and 3 should be 50%. I am unsure on how to do this, but I know it involves coming up with a modified set of weights for the sample() function. My progress is below, any advice is much appreciated.
>>
>> Best Wishes,
>>
>> Ben Ward, UEA.
>>
>>
>> So I have used the following code to come up with a matrix, which contains weighting according to each criteria:
>>
>> possibilities <- as.matrix(expand.grid(bases))
>>   identities <- apply(possibilities, 1, function(x) c(x[1] == x[2], x[1] == x[3], x[2] == x[3]))
>>   prob <- matrix(rep(0, length(identities)), ncol = ncol(identities))
>>   consProb <- apply(identities, 1, function(x){0.5 / length(which(x))})
>>   polProb <- apply(identities, 1, function(x){0.5 / length(which(!x))})
>>   for(i in 1:nrow(identities)){
>>     prob[i, which(identities[i,])] <- consProb[i]
>>     prob[i, which(!identities[i,])] <- polProb[i]
>>   }
>>   rownames(prob) <- c("1==2", "1==3", "2==3")
>>   colnames(prob) <- apply(possibilities, 1, function(x)paste(x, collapse = ", "))
>>
>> This code gives the following matrix:
>>
>>                 A, A, C    C, A, C          A, G, C        C, G, C       A, A, T         C, A, T       A, G, T       C, G, T
>> 1==2 0.25000000 0.08333333 0.08333333 0.08333333 0.25000000 0.08333333 0.08333333 0.08333333
>> 1==3 0.08333333 0.25000000 0.08333333 0.25000000 0.08333333 0.08333333 0.08333333 0.08333333
>> 2==3 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000
>>
>> Each column is one of the choices from 'possibilities', and each row gives a series of weights based on three different criteria:
>>
>> Row 1, that if it possible from the choices for letter 1 == letter 2, that combined chance be 50%.
>> Row 2, that if it possible from the choices for letter 1 == letter 3, that combined chance be 50%.
>> Row 3, that if it possible from the choices for letter 2 == letter 3, that combined chance be 50%.
>>
>> So:
>>
>> If I used sample(x = 1:now(possibilities), size = 1, prob = prob[1,]) repeatedly, I expect about half the choices to contain identical letters 1 and 2.
>>
>> If I used sample(x = 1:now(possibilities), size = 1, prob = prob[2,]) repeatedly, I expect about half the choices to contain identical letters 1 and 3.
>>
>> If I used sample(x = 1:now(possibilities), size = 1, prob = prob[3,]) repeatedly, I expect about half the choices to contain identical letters 2 and 3. Except that in this case, since it is not possible.
>>
>> Note each row sums to 1.
>>
>> What I would like to do - if it is possible - is combine these three sets of weights into one set, that when used with
>> sample(x = 1:nrow(possibilities, size = 1, prob = MAGICPROB) will give me a list of choices, where ~50% of them contain identical letters 1 and 2, AND ~50% of them contain identical letters 1 and 3, AND ~50% again contain identical letters 2 and 3 (except in this example as it is not possible from the choices).
>>
>> Can multiple probability weightings be combined in such a manner?
>>
>>

Ben,

If I correctly understand your requirements, you can't do what you are 
asking.  If you only have the eight possibilities that you list, then to 
get letters 1 and two to match 50% of the time you must select row 1 
with probability=.25 and row 5 with probability=.25.  To have the first 
and third letters match 50% of the time you must select rows 2 and 4 
each with probability=.25.  Those probabilities sum to 1, so you can 
never select any of the other rows.

Am I missing something?

Dan

-- 
Daniel Nordlund
Bothell, WA USA