[Rd] pbinom( ) function (PR#8700)

Wed Mar 22 17:39:03 CET 2006

On 3/22/2006 10:08 AM, Peter Dalgaard wrote:
> Duncan Murdoch <murdoch at stats.uwo.ca> writes:
> 
>> On 3/22/2006 3:52 AM, maechler at stat.math.ethz.ch wrote:
>> >>>>>> "cspark" == cspark  <cspark at clemson.edu>
>> >>>>>>     on Wed, 22 Mar 2006 05:52:13 +0100 (CET) writes:
>> > 
>> >     cspark> Full_Name: Chanseok Park Version: R 2.2.1 OS: RedHat
>> >     cspark> EL4 Submission from: (NULL) (130.127.112.89)
>> > 
>> > 
>> > 
>> >     cspark> pbinom(any negative value, size, prob) should be
>> >     cspark> zero.  But I got the following results.  I mean, if
>> >     cspark> a negative value is close to zero, then pbinom()
>> >     cspark> calculate pbinom(0, size, prob). 
>> > 
>> >     >> pbinom( -2.220446e-22, 3,.1)
>> >     [1] 0.729
>> >     >> pbinom( -2.220446e-8, 3,.1)
>> >     [1] 0.729
>> >     >> pbinom( -2.220446e-7, 3,.1)
>> >     [1] 0
>> > 
>> > Yes, all the [dp]* functions which are discrete with mass on the
>> > integers only, do *round* their 'x' to integers.
>> > 
>> > I could well argue that the current behavior is *not* a bug,
>> > since we do treat "x close to integer" as integer, and hence 
>> >    pbinom(eps, size, prob)  with  eps "very close to 0" should give
>> >    pbinom(0,   size, prob)
>> > as it now does.
>> > 
>> > However, for esthetical reasons, 
>> > I agree that we should test for "< 0" first (and give 0 then) and only
>> > round otherwise.  I'll change this for R-devel (i.e. R 2.3.0 in
>> > about a month).
>> > 
>> >     cspark> dbinom() also behaves similarly.
>> > 
>> > yes, similarly, but differently.
>> > I have changed it (for R-devel) as well, to behave the same as
>> > others d*() , e.g., dpois(), dnbinom() do.
>> 
>> Martin, your description makes it sound as though dbinom(0.3, size, 
>> prob) would give the same answer as dbinom(0, size, prob), whereas it 
>> actually gives 0 with a warning, as documented in ?dbinom.  The d* 
>> functions only round near-integers to integers, where it looks as though 
>> near means within 1E-7.  The p* functions round near integers to 
>> integers, and truncate others to the integer below.
> 
> Well, the p-functions are constant on the intervals between
> integers... 

Not quite:  they're constant on intervals (n - 1e-7, n+1 - 1e-7), for 
integers n.  Since Martin's change, this is not true for n=0.

(Or, did you refer to the lack of a warning? One point
> could be that cumulative p.d.f.s extends naturally to non-integers,
> whereas densities don't really extend, since they are defined with
> respect to counting measure on the integers.)

I wasn't complaining about the behaviour here, I was just clarifying 
Martin's description of it, when he said that "all the [dp]* functions 
which are discrete with mass on the integers only, do *round* their 'x' 
to integers".

>  
>> I suppose the reason for this behaviour is to protect against rounding 
>> error giving nonsense results; I'm not sure that's a great idea, but if 
>> we do it, should we really be handling 0 differently?
> 
> Most of these round-near-integer issues were spurred by real
> programming problems. It is somewhat hard to come up with a problem
> that leads you generate a binomial variate value with "floating point
> noise", but I'm quite sure that we'll be reminded if we try to change
> it... (One potential issue is back-calculation to counts from relative
> frequencies).

Again, I wasn't suggesting we change the general +/- 1E-7 behaviour 
(though it should be documented to avoid bug reports like this one), but 
I'm worried about having zero as a special case.  This will break 
relations such as

  dbinom(x, n, 0.5) == dbinom(n-x, n, 0.5)

(in the case where x is n+epsilon or -epsilon, for small enough 
epsilon).  Is it really desirable to break the symmetry like this?

Duncan Murdoch