[R] Small p from binomial probability function.

Fri Oct 11 03:11:29 CEST 2013

It is mysterious to me why the procedure proposed by Stefan Evert works.
It appears to work --- once you modify the call to binom.test() to have the
correct syntax.  In a sequence of 1000  trials with random values of N, x,
and p0, the answers from Evert's procedure agreed with the answer given
by uniroot() to within +/- 3.045e-05.

However your question was (in effect) how to solve the equation

     Pr(X <= x) = p0

for p, where X ~ Binom(N,p), with N and x known.  What this has to do with
confidence intervals for p is, to my mind at least, completely opaque.
In contrast it is obvious why the procedure using uniroot() works.

I would suggest that you stick with the uniroot() procedure in that it is
readily comprehensible.

     cheers,

     Rolf Turner

On 10/11/13 03:56, Benjamin Ward (ENV) wrote:
> Hi,
>
> Thank you for your answers, I'm not completely sure if it's to bino.test I need or the uniroot. Perhaps I should explain more the idea behind the code and the actual task I'm trying to do. The idea is to calculate a confidence interval as to the age of two DNA sequences which have diverged, where I know the number of mutations that happened in them, and I know the mutation rate.
>
> The binomial probability can be used since, mutations have a probability of occurring or being observed so many times in a sequence. This is dependent on the length of the DNA stretch (which equates to the number of trials since each base is a possibility of observing a mutation), the probability of a single mutation occurring which is p = t * u, since more time means a higher probability a mutation may have occurred.
>
> So my code, using pbinom, is supposed to calculate the probability that my DNA stretches contain the number of mutations observed P(X = k), given their size (trials) and the probability of a single mutation (p = t * u). However I'm interested in finding t: t is what is unknown, so the loop repeatedly evaluates the calculation, increasing t each time and checking P(X=k), when it is 0.05, 0.50 and 0.95, we record t.
>
> Ideally I'd like to rearrange this so I can get the probability of a single success (mutation) p, and then divide by the mutation rate to get my t. My supervisor gave my the loopy code but I imagine there is a way to plug in P(X=k) as 0.05 and 0.95 and get my upper and lower t estimates.
>
> According to the R built in docs:
>
> binom.test
> Description:
>
>       Performs an exact test of a simple null hypothesis about the
>       probability of success in a Bernoulli experiment.
>
> Perhaps this is the one I need rather than uniroot?
>
> Best,
> Ben.
>
>
> ________________________________________
> From: Stefan Evert [stefanML at collocations.de]
> Sent: 10 October 2013 09:37
> To: R-help Mailing List
> Cc: Benjamin Ward (ENV)
> Subject: Re: [R] Small p from binomial probability function.
>
> Sounds like you want a 95% binomial confidence interval:
>
>          binom.test(N, P)
>
> will compute this for you, and you can get the bounds directly with
>
>          binom.test(N, P)$conf.int
>
> Actually, binom.test computes a two-sided confidence interval, which corresponds roughly to 2.5 and 97.5 percentages in your approach. It doesn't give you the 50% point either, but I don't think that's a meaningful quantity with a two-sided test.
>
> Hope this helps,
> Stefan
>
>
> On 9 Oct 2013, at 15:53, Benjamin Ward (ENV) <B.Ward at uea.ac.uk> wrote:
>
>> I got given some code that uses the R function pbionom:
>>
>> p <- mut * t
>> sumprobs <- pbinom( N, B, p ) * 1000
>>
>> Which gives the output of a probability as a percentage like 5, 50, 95.
>>
>> What the code currently does is find me the values of t I need, by using the above two code lines in a loop, each iteration it increaces t by one and runs the two lines. When sumprobs equals 5, it records the value t, then again when sumprobs is equal to 50, and again when sumprobs is equal to 95 - giving me three t values. This is not an efficient way of doing this if t is large. Is it possible to rearrange pbinom so it gives me the small p (made of mut*t) as the result of plugging in the sumprobs instead, and is there an R function that already does this?
>>
>> Since pbinom is the binomial probability equation I suppose the question is - in more mathematical terminology - can I change this code so that instead of calculating the Probability of N successes given the number of trials and the probability of a single success, can I instead calculate the probability of a single success using the probability of N successes and number of trials, and the number of successes? Can R do this for me. So instead I plug in 5, 50, and 95, and then get the small p out?