[R] Unexpected behaviour when using format(x, scientific = TRUE) on integer vector

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Sun Jun 20 01:28:51 CEST 2021


On 19/06/2021 9:58 a.m., Remo Röthlin wrote:
> Dear useRs
> 
> I’m encountering an unexpected behaviour when trying to apply format(x, scientific = TRUE) on integer vectors (but not double vectors).
> The resulting string is not formatted in scientific notation, however, using formatC() instead, the result is as expected.
> 
> Is this the expected behaviour of format(x, scientific = TRUE)? I haven’t found any information or discussion on a difference in scientific notation between format and formatC.

If you look at the internals of  the format.default() function, you'll 
see that it ignores the "scientific" argument when the type of the 
argument is integer:

https://github.com/wch/r-source/blob/23dc578c6f40acdf53f92bab88cf91ecd25cd2e8/src/main/paste.c#L543-L552

The help page describes that argument as:

`Either a logical specifying whether elements of a real or complex 
vector should be encoded in scientific format, or an integer penalty 
(see options("scipen")). Missing values correspond to the current 
default penalty.`

so there's no reason to expect it applies to integer vectors as well.

I suspect the reason for this goes back to S, which was influenced more 
by Fortran than by C:  and I think Fortran (at least as it was in the 
70s and 80s) never used scientific notation on integers.

Duncan Murdoch

> Both functions are implemented as .Internal() functions in C, and while do_formatC() uses C’s directly built-in capabilities to format, do_format() does additional work.
> Unfortunately my knowledge of R internals is not good enough to see why format() treats integers differently in this case.
> 
> Warm regards,
> 
> Remo
> 
> SessionInfo and code to reproduce the issue with output (was also reproduced on Windows 10 x64 R 4.1.0 and RStudio Cloud R 3.6.3 & R 4.0.3):
> 
>> sessionInfo()
> R version 4.1.0 (2021-05-18)
> Platform: x86_64-apple-darwin17.0 (64-bit)
> Running under: macOS Big Sur 10.16
> 
> Matrix products: default
> BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
> 
> locale:
> [1] de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> loaded via a namespace (and not attached):
> [1] compiler_4.1.0
>> Sys.getlocale()
> [1] "de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8"
>>
>> numvec <- c(-1.23e4, 1.23e4)
>> typeof(numvec) # double
> [1] "double"
>>
>> intvec <- c(-1.23e4L, 1.23e4L)
>> typeof(intvec) # integer
> [1] "integer"
>>
>> numvec2 <- as.double(intvec)
>> identical(numvec, numvec2)
> [1] TRUE
>>
>> formatC(numvec, format = "e") # Formatted as scientific notation
> [1] "-1.2300e+04" "1.2300e+04"
>> format(numvec, scientific = TRUE) # Formatted as scientific notation
> [1] "-1.23e+04" " 1.23e+04"
>>
>> formatC(intvec, format = "e") # Formatted as scientific notation
> [1] "-1.2300e+04" "1.2300e+04"
>> format(intvec, scientific = TRUE) # *Not* formatted as scientific notation
> [1] "-12300" " 12300"
>>
>> formatC(numvec2, format = "e") # Formatted as scientific notation
> [1] "-1.2300e+04" "1.2300e+04"
>> format(numvec2, scientific = TRUE) # Formatted as scientific notation
> [1] "-1.23e+04" " 1.23e+04"
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list