[R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'

Mathieu Basille basille.web at ase-research.org
Tue Jul 30 18:01:21 CEST 2013


Dear list,

Here is a simple example in which the behaviour of 'format' does not make 
sense to me. I have read the documentation and searched the archives, but 
nothing pointed me in the right direction to understand this behaviour. 
Let's start with a simple data frame:

df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)

Let's now create a new variable 'id2' which is the character representation 
of 'id'. Note that I use 'scientific = FALSE' to ensure that long numbers 
such as 100,000 are not formatted using their scientific representation (in 
this case 1e+05):

df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))

Let's have a look at part of the result:

df1$id2[99990:100010]
  [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"
  [8] "99997"  "99998"  "99999"  "100000" "100001" "100002" "100003"
[15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"

So far, so good. Let's now play with the 'digits' option:

options(digits = 4)
df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], scientific = FALSE))
df2$id2[99990:100010]
  [1] "99990"  "99991"  "99992"  "99993"  "99994"  " 99995" " 99996"
  [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
[15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"

Notice the extra leading space from 99995 to 99999? To make sure it only 
happened there:

df2$id2[which(df1$id2 != df2$id2)]
[1] " 99995" " 99996" " 99997" " 99998" " 99999"

And just to make sure it only occurs in a 'apply' call, here is the same 
directly on a numeric vector:

id2 <- format(1:110000, scientific = FALSE)
id2[99990:100010]
  [1] " 99990" " 99991" " 99992" " 99993" " 99994" " 99995" " 99996"
  [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
[15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"

Here the leading spaces are for every number, which makes sense to me. Is 
there anything I'm misinterpreting in the behaviour of 'format'?
Thanks in advance for any hint,
Mathieu.


PS: Some background for this question. It all comes from a Rmd document, 
that knitr consistently failed to process, while the R code was fine using 
batch or interactive R. knitr uses 'options(digits = 4)' as opposed to 
'options(digits = 7)' by default in R, which made one of my function throw 
an error with knitr, but not with batch or interactive R. I managed to 
solve the problem using 'trim = TRUE' in 'format', but I still do not 
understand what's going on...
If you're interested, see here for more details on the original problem: 
http://stackoverflow.com/questions/17866230/knitr-vs-interactive-r-behaviour/17872176


-- 

~$ whoami
Mathieu Basille, PhD

~$ locate --details
University of Florida \\
Fort Lauderdale Research and Education Center
(+1) 954-577-6314
http://ase-research.org/basille

~$ fortune
« Le tout est de tout dire, et je manque de mots
Et je manque de temps, et je manque d'audace. »
  -- Paul Éluard



More information about the R-help mailing list