[R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'

Tue Jul 30 20:07:30 CEST 2013

Thanks David for your interest. I have to admit that your answer puzzles me 
even more than before. It seems that the underlying problem is way beyond 
my R skills...

The generation of id2 is indeed quite demanding, especially compared to a 
simple 'as.character' call. Anyway, since it seems to be system specific, 
here is the sessionInfo() that I forgot to attach to my first message:

R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8
  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

In brief: last stable R available under Debian Testing... Hopefully this 
can help tracking down the problem.
Mathieu.

Le 07/30/2013 01:58 PM, David Winsemius a écrit :
>
> On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote:
>
>> Dear list,
>>
>> Here is a simple example in which the behaviour of 'format' does not make sense to me. I have read the documentation and searched the archives, but nothing pointed me in the right direction to understand this behaviour. Let's start with a simple data frame:
>>
>> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
>>
>> Let's now create a new variable 'id2' which is the character representation of 'id'. Note that I use 'scientific = FALSE' to ensure that long numbers such as 100,000 are not formatted using their scientific representation (in this case 1e+05):
>>
>> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))
>>
>> Let's have a look at part of the result:
>>
>> df1$id2[99990:100010]
>> [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"
>> [8] "99997"  "99998"  "99999"  "100000" "100001" "100002" "100003"
>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
>
> Some formating processes are carried out by system functions. In this case I am unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched
>
>> df1$id2[99990:100010]
>  [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
>  [9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004" "100005"
> [17] "100006" "100007" "100008" "100009" "100010"
>
> (I did notice that generation of the id2 variable seemed to take an inordinately long time.)
>
> -- David.
>>
>> So far, so good. Let's now play with the 'digits' option:
>>
>> options(digits = 4)
>> df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
>> df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], scientific = FALSE))
>> df2$id2[99990:100010]
>> [1] "99990"  "99991"  "99992"  "99993"  "99994"  " 99995" " 99996"
>> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
>>
>> Notice the extra leading space from 99995 to 99999? To make sure it only happened there:
>>
>> df2$id2[which(df1$id2 != df2$id2)]
>> [1] " 99995" " 99996" " 99997" " 99998" " 99999"
>>
>> And just to make sure it only occurs in a 'apply' call, here is the same directly on a numeric vector:
>>
>> id2 <- format(1:110000, scientific = FALSE)
>> id2[99990:100010]
>> [1] " 99990" " 99991" " 99992" " 99993" " 99994" " 99995" " 99996"
>> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
>>
>> Here the leading spaces are for every number, which makes sense to me. Is there anything I'm misinterpreting in the behaviour of 'format'?
>> Thanks in advance for any hint,
>> Mathieu.
>>
>>
>> PS: Some background for this question. It all comes from a Rmd document, that knitr consistently failed to process, while the R code was fine using batch or interactive R. knitr uses 'options(digits = 4)' as opposed to 'options(digits = 7)' by default in R, which made one of my function throw an error with knitr, but not with batch or interactive R. I managed to solve the problem using 'trim = TRUE' in 'format', but I still do not understand what's going on...
>> If you're interested, see here for more details on the original problem: http://stackoverflow.com/questions/17866230/knitr-vs-interactive-r-behaviour/17872176
>>
>>
>> --
>>
>> ~$ whoami
>> Mathieu Basille, PhD
>>
>> ~$ locate --details
>> University of Florida \\
>> Fort Lauderdale Research and Education Center
>> (+1) 954-577-6314
>> http://ase-research.org/basille
>>
>> ~$ fortune
>> « Le tout est de tout dire, et je manque de mots
>> Et je manque de temps, et je manque d'audace. »
>> -- Paul Éluard
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>

>
> On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote:
>
>> Dear list,
>>
>> Here is a simple example in which the behaviour of 'format' does not make sense to me. I have read the documentation and searched the archives, but nothing pointed me in the right direction to understand this behaviour. Let's start with a simple data frame:
>>
>> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
>>
>> Let's now create a new variable 'id2' which is the character representation of 'id'. Note that I use 'scientific = FALSE' to ensure that long numbers such as 100,000 are not formatted using their scientific representation (in this case 1e+05):
>>
>> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))
>>
>> Let's have a look at part of the result:
>>
>> df1$id2[99990:100010]
>> [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"
>> [8] "99997"  "99998"  "99999"  "100000" "100001" "100002" "100003"
>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
>
> Some formating processes are carried out by system functions. In this case I am unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched
>
>> df1$id2[99990:100010]
>   [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
>   [9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004" "100005"
> [17] "100006" "100007" "100008" "100009" "100010"
>
> (I did notice that generation of the id2 variable seemed to take an inordinately long time.)
>