[R] Interesting quirk with fractions and rounding
JRG
loesljrg at accucom.net
Fri Apr 21 14:48:15 CEST 2017
A good part of the problem in the specific case you initially presented
is that some non-integer numbers have an exact representation in the
binary floating point arithmetic being used. Basically, if the
fractional part is of the form 1/2^k for some integer k > 0, there is an
exact representation in the binary floating point scheme.
> options(digits=20)
> (100*23)/40
[1] 57.5
> 100*(23/40)
[1] 57.499999999999992895
So the two operations give a slightly different result because the
fractional part of the division of 100*23 by 40 is 0.5. So the first
operations gives, exactly, 57.5 while the second operation does not
because 23/40 has no exact representation.
But, change the example's divisor from 40 to 30 [the fractional part
from 1/2 to 2/3]:
> (100*23)/30
[1] 76.666666666666671404
> 100*(23/30)
[1] 76.666666666666671404
Now the two operations give the same answer to the full precision
available. So, it isn't "generally true true in R that (100*x)/y is
more accurate than 100*(x/y), if x > y."
The key (in your example) is a property of the way that floating point
arithmetic is implemented.
---JRG
On 04/21/2017 08:19 AM, Paul Johnson wrote:
> We all agree it is a problem with digital computing, not unique to R. I
> don't think that is the right place to stop.
>
> What to do? The round example arose in a real funded project where 2 R
> programs differed in results and cause was that one person got 57 and
> another got 58. The explanation was found, but its less clear how to
> prevent similar in future. Guidelines, anyone?
>
> So far, these are my guidelines.
>
> 1. Insert L on numbers to signal that you really mean INTEGER. In R,
> forgetting the L in a single number will usually promote whole calculation
> to floats.
> 2. S3 variables are called 'numeric' if they are integer or double storage.
> So avoid "is.numeric" and prefer "is.double".
> 3. == is a total fail on floats
> 4. Run print with digits=20 so we can see the less rounded number. Perhaps
> start sessions with "options(digits=20)"
> 5. all.equal does what it promises, but one must be cautious.
>
> Are there math habits we should follow?
>
> For example, Is it generally true in R that (100*x)/y is more accurate than
> 100*(x/y), if x > y? (If that is generally true, couldn't the R
> interpreter do it for the user?)
>
> I've seen this problem before. In later editions of the game theory program
> Gambit, extraordinary effort was taken to keep values symbolically as
> integers as long as possible. Avoid division until the last steps. Same in
> Swarm simulations. Gary Polhill wrote an essay about the Ghost in the
> Machine along those lines, showing accidents from trusting floats.
>
> I wonder now if all uses of > or < with numeric variables are suspect.
>
> Oh well. If everybody posts their advice, I will write a summary.
>
> Paul Johnson
> University of Kansas
>
> On Apr 21, 2017 12:02 AM, "PIKAL Petr" <petr.pikal at precheza.cz> wrote:
>
>> Hi
>>
>> The problem is that people using Excel or probably other such spreadsheets
>> do not encounter this behaviour as Excel silently rounds all your
>> calculations and makes approximate comparison without telling it does so.
>> Therefore most people usually do not have any knowledge of floating point
>> numbers representation.
>>
>> Cheers
>> Petr
>>
>> -----Original Message-----
>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Paul
>> Johnson
>> Sent: Thursday, April 20, 2017 11:56 PM
>> To: R-help <r-help at r-project.org>
>> Subject: [R] Interesting quirk with fractions and rounding
>>
>> Hello, R friends
>>
>> My student unearthed this quirk that might interest you.
>>
>> I wondered if this might be a bug in the R interpreter. If not a bug, it
>> certainly stands as a good example of the dangers of floating point numbers
>> in computing.
>>
>> What do you think?
>>
>>> 100*(23/40)
>> [1] 57.5
>>> (100*23)/40
>> [1] 57.5
>>> round(100*(23/40))
>> [1] 57
>>> round((100*23)/40)
>> [1] 58
>>
>> The result in the 2 rounds should be the same, I think. Clearly some
>> digital number devil is at work. I *guess* that when you put in whole
>> numbers and group them like this (100*23), the interpreter does integer
>> math, but if you group (23/40), you force a fractional division and a
>> floating point number. The results from the first 2 calculations are not
>> actually 57.5, they just appear that way.
>>
>> Before you close the books, look at this:
>>
>>> aa <- 100*(23/40)
>>> bb <- (100*23)/40
>>> all.equal(aa,bb)
>> [1] TRUE
>>> round(aa)
>> [1] 57
>>> round(bb)
>> [1] 58
>>
>> I'm putting this one in my collection of "difficult to understand"
>> numerical calculations.
>>
>> If you have seen this before, I'm sorry to waste your time.
>>
>> pj
>> --
>> Paul E. Johnson http://pj.freefaculty.org
>> Director, Center for Research Methods and Data Analysis
>> http://crmda.ku.edu
>>
>> To write to me directly, please address me at pauljohn at ku.edu.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ________________________________
>> Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou
>> určeny pouze jeho adresátům.
>> Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě
>> neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie
>> vymažte ze svého systému.
>> Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email
>> jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
>> Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi
>> či zpožděním přenosu e-mailu.
>>
>> V případě, že je tento e-mail součástí obchodního jednání:
>> - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření
>> smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
>> - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout;
>> Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany
>> příjemce s dodatkem či odchylkou.
>> - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve
>> výslovným dosažením shody na všech jejích náležitostech.
>> - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za
>> společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn
>> nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto
>> emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich
>> existence je adresátovi či osobě jím zastoupené známá.
>>
>> This e-mail and any documents attached to it may be confidential and are
>> intended only for its intended recipients.
>> If you received this e-mail by mistake, please immediately inform its
>> sender. Delete the contents of this e-mail with all attachments and its
>> copies from your system.
>> If you are not the intended recipient of this e-mail, you are not
>> authorized to use, disseminate, copy or disclose this e-mail in any manner.
>> The sender of this e-mail shall not be liable for any possible damage
>> caused by modifications of the e-mail or by delay with transfer of the
>> email.
>>
>> In case that this e-mail forms part of business dealings:
>> - the sender reserves the right to end negotiations about entering into a
>> contract in any time, for any reason, and without stating any reasoning.
>> - if the e-mail contains an offer, the recipient is entitled to
>> immediately accept such offer; The sender of this e-mail (offer) excludes
>> any acceptance of the offer on the part of the recipient containing any
>> amendment or variation.
>> - the sender insists on that the respective contract is concluded only
>> upon an express mutual agreement on all its aspects.
>> - the sender of this e-mail informs that he/she is not authorized to enter
>> into any contracts on behalf of the company except for cases in which
>> he/she is expressly authorized to do so in writing, and such authorization
>> or power of attorney is submitted to the recipient or the person
>> represented by the recipient, or the existence of such authorization is
>> known to the recipient of the person represented by the recipient.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list