[R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'

David Winsemius dwinsemius at comcast.net
Tue Jul 30 19:58:37 CEST 2013


On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote:

> Dear list,
> 
> Here is a simple example in which the behaviour of 'format' does not make sense to me. I have read the documentation and searched the archives, but nothing pointed me in the right direction to understand this behaviour. Let's start with a simple data frame:
> 
> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
> 
> Let's now create a new variable 'id2' which is the character representation of 'id'. Note that I use 'scientific = FALSE' to ensure that long numbers such as 100,000 are not formatted using their scientific representation (in this case 1e+05):
> 
> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))
> 
> Let's have a look at part of the result:
> 
> df1$id2[99990:100010]
> [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"
> [8] "99997"  "99998"  "99999"  "100000" "100001" "100002" "100003"
> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"

Some formating processes are carried out by system functions. In this case I am unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched

> df1$id2[99990:100010]
 [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997" 
 [9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004" "100005"
[17] "100006" "100007" "100008" "100009" "100010"

(I did notice that generation of the id2 variable seemed to take an inordinately long time.)

-- 
David.
> 
> So far, so good. Let's now play with the 'digits' option:
> 
> options(digits = 4)
> df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
> df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], scientific = FALSE))
> df2$id2[99990:100010]
> [1] "99990"  "99991"  "99992"  "99993"  "99994"  " 99995" " 99996"
> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
> 
> Notice the extra leading space from 99995 to 99999? To make sure it only happened there:
> 
> df2$id2[which(df1$id2 != df2$id2)]
> [1] " 99995" " 99996" " 99997" " 99998" " 99999"
> 
> And just to make sure it only occurs in a 'apply' call, here is the same directly on a numeric vector:
> 
> id2 <- format(1:110000, scientific = FALSE)
> id2[99990:100010]
> [1] " 99990" " 99991" " 99992" " 99993" " 99994" " 99995" " 99996"
> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
> 
> Here the leading spaces are for every number, which makes sense to me. Is there anything I'm misinterpreting in the behaviour of 'format'?
> Thanks in advance for any hint,
> Mathieu.
> 
> 
> PS: Some background for this question. It all comes from a Rmd document, that knitr consistently failed to process, while the R code was fine using batch or interactive R. knitr uses 'options(digits = 4)' as opposed to 'options(digits = 7)' by default in R, which made one of my function throw an error with knitr, but not with batch or interactive R. I managed to solve the problem using 'trim = TRUE' in 'format', but I still do not understand what's going on...
> If you're interested, see here for more details on the original problem: http://stackoverflow.com/questions/17866230/knitr-vs-interactive-r-behaviour/17872176
> 
> 
> -- 
> 
> ~$ whoami
> Mathieu Basille, PhD
> 
> ~$ locate --details
> University of Florida \\
> Fort Lauderdale Research and Education Center
> (+1) 954-577-6314
> http://ase-research.org/basille
> 
> ~$ fortune
> « Le tout est de tout dire, et je manque de mots
> Et je manque de temps, et je manque d'audace. »
> -- Paul Éluard
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list