[Rd] pbinom( ) function (PR#8700)

Peter Dalgaard p.dalgaard at biostat.ku.dk
Wed Mar 22 16:08:41 CET 2006


Duncan Murdoch <murdoch at stats.uwo.ca> writes:

> On 3/22/2006 3:52 AM, maechler at stat.math.ethz.ch wrote:
> >>>>>> "cspark" == cspark  <cspark at clemson.edu>
> >>>>>>     on Wed, 22 Mar 2006 05:52:13 +0100 (CET) writes:
> > 
> >     cspark> Full_Name: Chanseok Park Version: R 2.2.1 OS: RedHat
> >     cspark> EL4 Submission from: (NULL) (130.127.112.89)
> > 
> > 
> > 
> >     cspark> pbinom(any negative value, size, prob) should be
> >     cspark> zero.  But I got the following results.  I mean, if
> >     cspark> a negative value is close to zero, then pbinom()
> >     cspark> calculate pbinom(0, size, prob). 
> > 
> >     >> pbinom( -2.220446e-22, 3,.1)
> >     [1] 0.729
> >     >> pbinom( -2.220446e-8, 3,.1)
> >     [1] 0.729
> >     >> pbinom( -2.220446e-7, 3,.1)
> >     [1] 0
> > 
> > Yes, all the [dp]* functions which are discrete with mass on the
> > integers only, do *round* their 'x' to integers.
> > 
> > I could well argue that the current behavior is *not* a bug,
> > since we do treat "x close to integer" as integer, and hence 
> >    pbinom(eps, size, prob)  with  eps "very close to 0" should give
> >    pbinom(0,   size, prob)
> > as it now does.
> > 
> > However, for esthetical reasons, 
> > I agree that we should test for "< 0" first (and give 0 then) and only
> > round otherwise.  I'll change this for R-devel (i.e. R 2.3.0 in
> > about a month).
> > 
> >     cspark> dbinom() also behaves similarly.
> > 
> > yes, similarly, but differently.
> > I have changed it (for R-devel) as well, to behave the same as
> > others d*() , e.g., dpois(), dnbinom() do.
> 
> Martin, your description makes it sound as though dbinom(0.3, size, 
> prob) would give the same answer as dbinom(0, size, prob), whereas it 
> actually gives 0 with a warning, as documented in ?dbinom.  The d* 
> functions only round near-integers to integers, where it looks as though 
> near means within 1E-7.  The p* functions round near integers to 
> integers, and truncate others to the integer below.

Well, the p-functions are constant on the intervals between
integers... (Or, did you refer to the lack of a warning? One point
could be that cumulative p.d.f.s extends naturally to non-integers,
whereas densities don't really extend, since they are defined with
respect to counting measure on the integers.)
 
> I suppose the reason for this behaviour is to protect against rounding 
> error giving nonsense results; I'm not sure that's a great idea, but if 
> we do it, should we really be handling 0 differently?

Most of these round-near-integer issues were spurred by real
programming problems. It is somewhat hard to come up with a problem
that leads you generate a binomial variate value with "floating point
noise", but I'm quite sure that we'll be reminded if we try to change
it... (One potential issue is back-calculation to counts from relative
frequencies).


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907



More information about the R-devel mailing list