[R] Combining multiple probability weights for the sample() function.

Wed Jun 3 03:25:26 CEST 2015

Hi Ben,
While Jean's answer looks correct, I think that there is something
amiss with your specification of the problem. You have eight
combinations in your "possibilities". So if you draw samples "x"
where:

If p(x = possibilities[1,] | possibilities[5,]) = 0.5 AND
 p(x = possibilities[2,] | possibilities[4,]) = 0.5

then:

p(x = possibilities[3,] | possibilities[6,] | possibilities[7,] |
possibilities[8,]) = 0

You state that you want the probability of drawing a triplet with
identical second and third bases to be zero, which is ensured by
having no such "possibilities" in the set to be sampled. This is the
constraint that forces the above, as the two conditions (identical
first and second, identical first and third) are disjunct and as the
sum of all probabilities cannot exceed one, four "possibilities" must
have probabilities of zero.

Jim

On Tue, Jun 2, 2015 at 11:57 PM, Adams, Jean <jvadams at usgs.gov> wrote:
> Ben,
>
> Perhaps I am missing something, but couldn't you simply reduce your
> possibilities to:
>
> possibilities[c(1, 5, 2, 4), ]
>      Var1 Var2 Var3
> [1,] "A"  "A"  "C"
> [2,] "A"  "A"  "T"
> [3,] "C"  "A"  "C"
> [4,] "C"  "G"  "C"
>
> If you sample from these four rows you will have a 50% chance that Var1 and
> Var2 are equal and a 50% chance that Var1 and Var3 are equal.
>
> Jean
>
>
> On Tue, Jun 2, 2015 at 7:26 AM, Benjamin Ward (ENV) <B.Ward at uea.ac.uk>
> wrote:
>
>> Dear R-List,
>>
>> I have a set of possibilities I want to sample from:
>>
>> bases <- list(c('A', 'C'), c('A', 'G'), c('C', 'T'))
>> possibilities <- as.matrix(expand.grid(bases))
>>
>> >possibilities
>> Var1 Var2 Var3
>> [1,] "A"  "A"  "C"
>> [2,] "C"  "A"  "C"
>> [3,] "A"  "G"  "C"
>> [4,] "C"  "G"  "C"
>> [5,] "A"  "A"  "T"
>> [6,] "C"  "A"  "T"
>> [7,] "A"  "G"  "T"
>> [8,] "C"  "G"  "T"
>>
>> If I want to randomly sample one of these rows. If I do this, I find that
>> it is 25% likely that my choice will have an identical first and last
>> letter (e.g. [1,] "A"  "A"  "C"). It is also 25% likely that my choice will
>> have an identical first and third letter (e.g. [4,] "C"  "G"  "C"). It is
>> not likely at all that the second and third letter of my choice could be
>> identical.
>>
>> What I would like to do, is sample one of the rows, but given the
>> constraint that the probability of drawing identical letters 1 and 2 should
>> be 50% or 0.5, and at the same time the probability of drawing identical
>> letters 1 and 3 should be 50%. I am unsure on how to do this, but I know it
>> involves coming up with a modified set of weights for the sample()
>> function. My progress is below, any advice is much appreciated.
>>
>> Best Wishes,
>>
>> Ben Ward, UEA.
>>
>>
>> So I have used the following code to come up with a matrix, which contains
>> weighting according to each criteria:
>>
>> possibilities <- as.matrix(expand.grid(bases))
>>   identities <- apply(possibilities, 1, function(x) c(x[1] == x[2], x[1]
>> == x[3], x[2] == x[3]))
>>   prob <- matrix(rep(0, length(identities)), ncol = ncol(identities))
>>   consProb <- apply(identities, 1, function(x){0.5 / length(which(x))})
>>   polProb <- apply(identities, 1, function(x){0.5 / length(which(!x))})
>>   for(i in 1:nrow(identities)){
>>     prob[i, which(identities[i,])] <- consProb[i]
>>     prob[i, which(!identities[i,])] <- polProb[i]
>>   }
>>   rownames(prob) <- c("1==2", "1==3", "2==3")
>>   colnames(prob) <- apply(possibilities, 1, function(x)paste(x, collapse =
>> ", "))
>>
>> This code gives the following matrix:
>>
>>                 A, A, C    C, A, C          A, G, C        C, G, C
>>  A, A, T         C, A, T       A, G, T       C, G, T
>> 1==2 0.25000000 0.08333333 0.08333333 0.08333333 0.25000000 0.08333333
>> 0.08333333 0.08333333
>> 1==3 0.08333333 0.25000000 0.08333333 0.25000000 0.08333333 0.08333333
>> 0.08333333 0.08333333
>> 2==3 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000
>> 0.06250000 0.06250000
>>
>> Each column is one of the choices from 'possibilities', and each row gives
>> a series of weights based on three different criteria:
>>
>> Row 1, that if it possible from the choices for letter 1 == letter 2, that
>> combined chance be 50%.
>> Row 2, that if it possible from the choices for letter 1 == letter 3, that
>> combined chance be 50%.
>> Row 3, that if it possible from the choices for letter 2 == letter 3, that
>> combined chance be 50%.
>>
>> So:
>>
>>  If I used sample(x = 1:now(possibilities), size = 1, prob = prob[1,])
>> repeatedly, I expect about half the choices to contain identical letters 1
>> and 2.
>>
>>  If I used sample(x = 1:now(possibilities), size = 1, prob = prob[2,])
>> repeatedly, I expect about half the choices to contain identical letters 1
>> and 3.
>>
>> If I used sample(x = 1:now(possibilities), size = 1, prob = prob[3,])
>> repeatedly, I expect about half the choices to contain identical letters 2
>> and 3. Except that in this case, since it is not possible.
>>
>> Note each row sums to 1.
>>
>> What I would like to do - if it is possible - is combine these three sets
>> of weights into one set, that when used with
>> sample(x = 1:nrow(possibilities, size = 1, prob = MAGICPROB) will give me
>> a list of choices, where ~50% of them contain identical letters 1 and 2,
>> AND ~50% of them contain identical letters 1 and 3, AND ~50% again contain
>> identical letters 2 and 3 (except in this example as it is not possible
>> from the choices).
>>
>> Can multiple probability weightings be combined in such a manner?
>>
>>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.