[Rd] [External] Re: rpois(9, 1e10)

Mon Jan 20 04:00:58 CET 2020

R uses the C 'int' type for its integer data and that is pretty much
universally 32 bit these days. In fact R wont' compile if it is not.
That means the range for integer data is the integers in [-2^31,
+2^31).

It would be good to allow for a larger integer range for R integer
objects, and several of us are thinking about how me might get there.
But it isn't easy to get right, so it may take some time. I doubt
anything can happen for R 4.0.0 this year, but 2021 may be possible.

I few notes inline below:

On Sun, 19 Jan 2020, Spencer Graves wrote:

> On my Mac:
>
>
> str(.Machine)
> ...
> $ integer.max          : int 2147483647
>  $ sizeof.long          : int 8
>  $ sizeof.longlong      : int 8
>  $ sizeof.longdouble    : int 16
>  $ sizeof.pointer       : int 8
>
>
>       On a Windows 10 machine I have, $ sizeof.long : int 4; otherwise
> the same as on my Mac.

One of many annoyances of Windows -- done for compatibility with
ancient Window apps.

>       Am I correct that $ sizeof.long = 4 means 4 bytes = 32 bits?
> log2(.Machine$integer.max) = 31.  Then 8 bytes is what used to be called
> double precision (2 words of 4 bytes each)?  And $ sizeof.longdouble =
> 16 = 4 words of 4 bytes each?

double precision is a floating point concept, not related to integers.

If you want to figure out whether you are running a 32 bit or 64 bit R
look at sizeof.pointer -- 4 means 32 bits, 8 64 bits.

Best,

luke

>
>
>       Spencer
>
>
> On 2020-01-19 15:41, Avraham Adler wrote:
>> Floor (maybe round) of non-negative numerics, though. Poisson should
>> never have anything after decimal.
>>
>> Still think it’s worth allowing long long for R64 bit, just for purity
>> sake.
>>
>> Avi
>>
>> On Sun, Jan 19, 2020 at 4:38 PM Spencer Graves
>> <spencer.graves using prodsyse.com <mailto:spencer.graves using prodsyse.com>> wrote:
>>
>>
>>
>>     On 2020-01-19 13:01, Avraham Adler wrote:
>>>     Crazy thought, but being that a sum of Poissons is Poisson in the
>>>     sum, can you break your “big” simulation into the sum of a few
>>>     smaller ones? Or is the order of magnitude difference just too great?
>>
>>
>>           I don't perceive that as feasible.  Once I found what was
>>     generating NAs, it was easy to code a function to return
>>     pseudo-random numbers using the standard normal approximation to
>>     the Poisson for those extreme cases.  [For a Poisson with mean =
>>     1e6, for example, the skewness (third standardized moment) is
>>     0.001.  At least for my purposes, that should be adequate.][1]
>>
>>
>>           What are the negative consequences of having rpois return
>>     numerics that are always nonnegative?
>>
>>
>>           Spencer
>>
>>
>>     [1]  In the code I reported before, I just changed the threshold
>>     of 1e6 to 0.5*.Machine$integer.max.  On my Mac,
>>     .Machine$integer.max = 2147483647 = 2^31 > 1e9. That still means
>>     that a Poisson distributed pseudo-random number just under that
>>     would have to be over 23000 standard deviations above the mean to
>>     exceed .Machine$integer.max.
>>
>>>
>>>     On Sun, Jan 19, 2020 at 1:58 PM Spencer Graves
>>>     <spencer.graves using prodsyse.com
>>>     <mailto:spencer.graves using prodsyse.com>> wrote:
>>>
>>>               This issue arose for me in simulations to estimate
>>>         confidence, prediction, and tolerance intervals from glm(.,
>>>         family=poisson) fits embedded in a BMA::bic.glm fit using a
>>>         simulate.bic.glm function I added to the development version
>>>         of Ecfun, available at "https://github.com/sbgraves237/Ecfun"
>>>         <https://github.com/sbgraves237/Ecfun>. This is part of a
>>>         vignette I'm developing, available at
>>>         "https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd"
>>>         <https://github.com/sbgraves237/Ecfun/blob/master/vignettes/time2nextNuclearWeaponState.Rmd>.
>>>         This includes a simulated mean of a mixture of Poissons that
>>>         exceeds 2e22.  It doesn't seem unreasonable to me to have
>>>         rpois output a numerics rather than integers when a number
>>>         simulated exceeds .Machine$integer.max.  And it does seem to
>>>         make less sense in such cases to return NAs.
>>>
>>>
>>>                Alternatively, might it make sense to add another
>>>         argument to rpois to give the user the choice?  E.g., an
>>>         argument "bigOutput" with (I hope) default = "numeric" and
>>>         "NA" as a second option.  Or NA is the default, so no code
>>>         that relied that feature of the current code would be broken
>>>         by the change.  If someone wanted to use arbitrary precision
>>>         arithmetic, they could write their own version of this
>>>         function with "arbitraryPrecision" as an optional value for
>>>         the "bigOutput" argument.
>>>
>>>
>>>               Comments?
>>>               Thanks,
>>>               Spencer Graves
>>>
>>>
>>>
>>>         On 2020-01-19 10:28, Avraham Adler wrote:
>>>>         Technically, lambda can always be numeric. It is the
>>>>         observations which must be integral.
>>>>
>>>>         Would hitting everything larger than maxint or maxlonglong
>>>>         with floor or round fundamentally change the distribution?
>>>>         Well, yes, but enough that it would matter over process risk?
>>>>
>>>>         Avi
>>>>
>>>>         On Sun, Jan 19, 2020 at 11:20 AM Benjamin Tyner
>>>>         <btyner using gmail.com <mailto:btyner using gmail.com>> wrote:
>>>>
>>>>             So imagine rpois is changed, such that the storage mode
>>>>             of its return
>>>>             value is sometimes integer and sometimes numeric. Then
>>>>             imagine the case
>>>>             where lambda is itself a realization of a random
>>>>             variable. Do we really
>>>>             want the storage mode to inherit that randomness?
>>>>
>>>>
>>>>             On 1/19/20 10:47 AM, Avraham Adler wrote:
>>>>            > Maybe there should be code for 64 bit R to use long
>>>>             long or the like?
>>>>            >
>>>>            > On Sun, Jan 19, 2020 at 10:45 AM Spencer Graves
>>>>            > <spencer.graves using prodsyse.com
>>>>             <mailto:spencer.graves using prodsyse.com>
>>>>             <mailto:spencer.graves using prodsyse.com
>>>>             <mailto:spencer.graves using prodsyse.com>>> wrote:
>>>>            >
>>>>            >
>>>>            >
>>>>            >     On 2020-01-19 09:34, Benjamin Tyner wrote:
>>>>            >     >>
>>>>            >
>>>>              ------------------------------------------------------------------------
>>>>            >     >> Hello, All:
>>>>            >     >>
>>>>            >     >>
>>>>            >     >>         Consider:
>>>>            >     >>
>>>>            >     >>
>>>>            >     >> Browse[2]> set.seed(1)
>>>>            >     >> Browse[2]> rpois(9, 1e10)
>>>>            >     >> NAs produced[1] NA NA NA NA NA NA NA NA NA
>>>>            >     >>
>>>>            >     >>
>>>>            >     >>         Should this happen?
>>>>            >     >>
>>>>            >     >>
>>>>            >     >>         I think that for, say, lambda>1e6,
>>>>             rpois should return
>>>>            >     rnorm(.,
>>>>            >     >> lambda, sqrt(lambda)).
>>>>            >     > But need to implement carefully; rpois should
>>>>             always return a
>>>>            >     > non-negative integer, whereas rnorm always
>>>>             returns numeric...
>>>>            >     >
>>>>            >
>>>>            >            Thanks for the reply.
>>>>            >
>>>>            >
>>>>            >            However, I think it's not acceptable to get
>>>>             an NA from a
>>>>            >     number
>>>>            >     that cannot be expressed as an integer.  Whenever
>>>>             a randomly
>>>>            >     generated
>>>>            >     number would exceed .Machine$integer.max, the
>>>>             choice is between
>>>>            >     returning NA or a non-integer numeric.  Consider:
>>>>            >
>>>>            >
>>>>            >      > 2*.Machine$integer.max
>>>>            >     [1] 4294967294
>>>>            >      > as.integer(2*.Machine$integer.max)
>>>>            >     [1] NA
>>>>            >     Warning message:
>>>>            >     NAs introduced by coercion to integer range
>>>>            >
>>>>            >
>>>>            >            I'd rather have the non-integer numeric.
>>>>            >
>>>>            >
>>>>            >            Spencer
>>>>            >
>>>>            >  ______________________________________________
>>>>            > R-devel using r-project.org <mailto:R-devel using r-project.org>
>>>>             <mailto:R-devel using r-project.org
>>>>             <mailto:R-devel using r-project.org>> mailing list
>>>>            > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>            >
>>>>            > --
>>>>            > Sent from Gmail Mobile
>>>>
>>>>         --
>>>>         Sent from Gmail Mobile
>>>
>>>     --
>>>     Sent from Gmail Mobile
>>
>> --
>> Sent from Gmail Mobile
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu