[Rd] suggestion for extending ?as.factor

Fri May 8 20:53:13 CEST 2009

On Fri, May 08, 2009 at 06:48:40PM +0200, Martin Maechler wrote:
> >>>>> "PS" == Petr Savicky <savicky at cs.cas.cz>
> >>>>>     on Fri, 8 May 2009 18:10:56 +0200 writes:
[...]
>     PS> ... I have
>     PS> strong objections against the existing implementation of as.character(),
> 
> {(because it is not *accurate* enough, right ?)}

The problem is not exactly low accuracy. The problem is unpredictable
accuracy. If the accuracy is systematically 15 or 14 digits, it would be
fine and suitable for most purposes.

However the accuracy ranges between 14 and 20 digits and may be different
on different platforms. For example, on my old Xeon comupter, the same
numbers may be converted to strings representing different values:

  with SSE               without SSE

  "8459184.47742229"     "8459184.4774223"     
  "84307700406756081664" "8.4307700406756e+19" 
  "9262815.27852281"     "9262815.2785228"     
  "2.1006742758024e+19"  "21006742758023974912"
  "7.07078598983389e+25" "7.0707859898339e+25" 
  "8.0808066145628e+28"  "8.08080661456281e+28"
  "9180932974.85929"     "9180932974.8593"     
  "72.4923408890729"     "72.492340889073"     

Sometimes there are differences in trailing zeros.

  with SSE               without SSE

  "1.97765325859480e+25" "1.9776532585948e+25" 
  "21762633836.0360"     "21762633836.036"     
  "2018960238339.80"     "2018960238339.8"     
  "239567.78053486"      "239567.780534860"    
  "2571116684765.50"     "2571116684765.5"     
  "3989945.2102949"      "3989945.21029490"    
  "1.1259245205867e+23"  "1.12592452058670e+23"
  "3.2867033904477e+29"  "3.28670339044770e+29"
  "2.8271117654895e+29"  "2.82711176548950e+29"
  "26854166.6173020"     "26854166.617302"     
  "4.85247217360750"     "4.8524721736075"     
  "345123.247838540"     "345123.24783854"     

For random numbers in the sample generated as 10^runif(100000, 0, 30),
from which i selected the first 20 examples above, the probability of
different results was almost 0.01 (978 differences among 100000 numbers).

I think that the platform dependence even limits the advantage
of backward compatibility.

Petr.