[R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'

Thu Aug 1 18:36:46 CEST 2013

I see the problem on both Linux and Windows, R-3.0.1.
  >  vapply(as.numeric(9994:9995), function(x)format(x, scientific=FALSE, digits=3), "")
  [1] "9994"  " 9995"
  > vapply(as.numeric(99994:99995), function(x)format(x, scientific=FALSE, digits=4), "")
  [1] "99994"  " 99995"
  > vapply(as.numeric(999994:999995), function(x)format(x, scientific=FALSE, digits=5), "")
  [1] "999994"  " 999995"

The ones with the initial space are the ones that would round up to the next power of 10 when
rounded to the requested number of significant digits:
  > x <- as.numeric(1:5e5)
  > z <- vapply(x, function(x)format(x, scientific=FALSE, digits=3), "")
  > i <- grep(" ", z)
  > z[i]
   [1] " 9995"  " 9996"  " 9997"  " 9998"  " 9999"  " 99950" " 99951" " 99952"
   [9] " 99953" " 99954" " 99955" " 99956" " 99957" " 99958" " 99959" " 99960"
  [17] " 99961" " 99962" " 99963" " 99964" " 99965" " 99966" " 99967" " 99968"
  [25] " 99969" " 99970" " 99971" " 99972" " 99973" " 99974" " 99975" " 99976"
  [33] " 99977" " 99978" " 99979" " 99980" " 99981" " 99982" " 99983" " 99984"
  [41] " 99985" " 99986" " 99987" " 99988" " 99989" " 99990" " 99991" " 99992"
  [49] " 99993" " 99994" " 99995" " 99996" " 99997" " 99998" " 99999"
  > print(x[i], digits=3)
   [1] 1e+04 1e+04 1e+04 1e+04 1e+04 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05
  [13] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05
  [25] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05
  [37] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05
  [49] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Mathieu Basille
> Sent: Thursday, August 01, 2013 8:31 AM
> To: R help
> Subject: Re: [R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'
> 
> This problem does not seem to be widely popular, but at least affects two
> users (both on Linux, maybe a hint here?). To me, it looks like a bug (is
> it a R bug, or a OS-related bug, I don't know). Should I forward it to
> R-devel, or some other place where R gurus may have a chance to look at it?
> 
> Mathieu.
> 
> 
> Le 07/30/2013 02:34 PM, arun a écrit :
> > Hi Mathieu
> > yes, the original problem occurs in my system too. I am using R 3.0.1 on linux mint 15.  I
> guess the default case would be trim=FALSE, but still it looks very strange especially in
> ?apply(), as it starts from " 99995" onwards.
> >
> > sessionInfo()
> > R version 3.0.1 (2013-05-16)
> > Platform: x86_64-unknown-linux-gnu (64-bit)
> >
> > locale:
> >   [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
> >   [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
> >   [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
> >   [7] LC_PAPER=C                 LC_NAME=C
> >   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> > other attached packages:
> > [1] stringr_0.6.2  reshape2_1.2.2
> >
> > loaded via a namespace (and not attached):
> > [1] plyr_1.8    tools_3.0.1
> >
> >
> >
> >
> >
> >
> >
> >
> > ----- Original Message -----
> > From: Mathieu Basille <basille.web at ase-research.org>
> > To: arun <smartpink111 at yahoo.com>
> > Cc: R help <r-help at r-project.org>
> > Sent: Tuesday, July 30, 2013 2:29 PM
> > Subject: Re: [R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'
> >
> > Thanks Arun for your answer. 'trim = TRUE' does indeed solve the symptoms
> > of the problem, and this is the solution I'm currently using. However, it
> > does not help to understand what the problem is, and what is the cause of it.
> >
> > Can you confirm that the original problem also occurs on your computer (and
> > what is your OS)? It would be interesting since David is not able to
> > reproduce the problem with Mac OS X.
> > Mathieu.
> >
> >
> > Le 07/30/2013 02:15 PM, arun a écrit :
> >> Hi,
> >> Try using trim=TRUE, in ?format()
> >> options(digits=4)
> >>
> >> df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
> >>     df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], trim=TRUE,scientific = FALSE))
> >>      df2$id2[99990:100010]
> >> # [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
> >> # [9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004" "100005"
> >> #[17] "100006" "100007" "100008" "100009" "100010"
> >>
> >>
> >> id2 <- format(1:110000, scientific = FALSE,trim=TRUE)
> >> id2[99990:100010]
> >> # [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
> >>     #[9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004" "100005"
> >> #[17] "100006" "100007" "100008" "100009" "100010"
> >> A.K.
> >>
> >>
> >> ----- Original Message -----
> >> From: Mathieu Basille <basille.web at ase-research.org>
> >> To: David Winsemius <dwinsemius at comcast.net>
> >> Cc: r-help at r-project.org
> >> Sent: Tuesday, July 30, 2013 2:07 PM
> >> Subject: Re: [R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'
> >>
> >> Thanks David for your interest. I have to admit that your answer puzzles me
> >> even more than before. It seems that the underlying problem is way beyond
> >> my R skills...
> >>
> >> The generation of id2 is indeed quite demanding, especially compared to a
> >> simple 'as.character' call. Anyway, since it seems to be system specific,
> >> here is the sessionInfo() that I forgot to attach to my first message:
> >>
> >> R version 3.0.1 (2013-05-16)
> >> Platform: x86_64-pc-linux-gnu (64-bit)
> >>
> >> locale:
> >>      [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
> >>      [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8
> >>      [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
> >>      [7] LC_PAPER=C                 LC_NAME=C
> >>      [9] LC_ADDRESS=C               LC_TELEPHONE=C
> >> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
> >>
> >> attached base packages:
> >> [1] stats     graphics  grDevices utils     datasets  methods   base
> >>
> >> In brief: last stable R available under Debian Testing... Hopefully this
> >> can help tracking down the problem.
> >> Mathieu.
> >>
> >>
> >> Le 07/30/2013 01:58 PM, David Winsemius a écrit :
> >>>
> >>> On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote:
> >>>
> >>>> Dear list,
> >>>>
> >>>> Here is a simple example in which the behaviour of 'format' does not make sense to
> me. I have read the documentation and searched the archives, but nothing pointed me in
> the right direction to understand this behaviour. Let's start with a simple data frame:
> >>>>
> >>>> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
> >>>>
> >>>> Let's now create a new variable 'id2' which is the character representation of 'id'.
> Note that I use 'scientific = FALSE' to ensure that long numbers such as 100,000 are not
> formatted using their scientific representation (in this case 1e+05):
> >>>>
> >>>> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))
> >>>>
> >>>> Let's have a look at part of the result:
> >>>>
> >>>> df1$id2[99990:100010]
> >>>> [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"
> >>>> [8] "99997"  "99998"  "99999"  "100000" "100001" "100002" "100003"
> >>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
> >>>
> >>> Some formating processes are carried out by system functions. In this case I am
> unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched
> >>>
> >>>> df1$id2[99990:100010]
> >>>      [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
> >>>      [9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004" "100005"
> >>> [17] "100006" "100007" "100008" "100009" "100010"
> >>>
> >>> (I did notice that generation of the id2 variable seemed to take an inordinately long
> time.)
> >>>
> >>> -- David.
> >>>>
> >>>> So far, so good. Let's now play with the 'digits' option:
> >>>>
> >>>> options(digits = 4)
> >>>> df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
> >>>> df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], scientific = FALSE))
> >>>> df2$id2[99990:100010]
> >>>> [1] "99990"  "99991"  "99992"  "99993"  "99994"  " 99995" " 99996"
> >>>> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
> >>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
> >>>>
> >>>> Notice the extra leading space from 99995 to 99999? To make sure it only
> happened there:
> >>>>
> >>>> df2$id2[which(df1$id2 != df2$id2)]
> >>>> [1] " 99995" " 99996" " 99997" " 99998" " 99999"
> >>>>
> >>>> And just to make sure it only occurs in a 'apply' call, here is the same directly on a
> numeric vector:
> >>>>
> >>>> id2 <- format(1:110000, scientific = FALSE)
> >>>> id2[99990:100010]
> >>>> [1] " 99990" " 99991" " 99992" " 99993" " 99994" " 99995" " 99996"
> >>>> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
> >>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
> >>>>
> >>>> Here the leading spaces are for every number, which makes sense to me. Is there
> anything I'm misinterpreting in the behaviour of 'format'?
> >>>> Thanks in advance for any hint,
> >>>> Mathieu.
> >>>>
> >>>>
> >>>> PS: Some background for this question. It all comes from a Rmd document, that
> knitr consistently failed to process, while the R code was fine using batch or interactive
> R. knitr uses 'options(digits = 4)' as opposed to 'options(digits = 7)' by default in R, which
> made one of my function throw an error with knitr, but not with batch or interactive R. I
> managed to solve the problem using 'trim = TRUE' in 'format', but I still do not
> understand what's going on...
> >>>> If you're interested, see here for more details on the original problem:
> http://stackoverflow.com/questions/17866230/knitr-vs-interactive-r-
> behaviour/17872176
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> ~$ whoami
> >>>> Mathieu Basille, PhD
> >>>>
> >>>> ~$ locate --details
> >>>> University of Florida \\
> >>>> Fort Lauderdale Research and Education Center
> >>>> (+1) 954-577-6314
> >>>> http://ase-research.org/basille
> >>>>
> >>>> ~$ fortune
> >>>> « Le tout est de tout dire, et je manque de mots
> >>>> Et je manque de temps, et je manque d'audace. »
> >>>> -- Paul Éluard
> >>>>
> >>>> ______________________________________________
> >>>> R-help at r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>> David Winsemius
> >>> Alameda, CA, USA
> >>>
> >>
> >>
> >>
> >>>
> >>> On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote:
> >>>
> >>>> Dear list,
> >>>>
> >>>> Here is a simple example in which the behaviour of 'format' does not make sense to
> me. I have read the documentation and searched the archives, but nothing pointed me in
> the right direction to understand this behaviour. Let's start with a simple data frame:
> >>>>
> >>>> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
> >>>>
> >>>> Let's now create a new variable 'id2' which is the character representation of 'id'.
> Note that I use 'scientific = FALSE' to ensure that long numbers such as 100,000 are not
> formatted using their scientific representation (in this case 1e+05):
> >>>>
> >>>> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))
> >>>>
> >>>> Let's have a look at part of the result:
> >>>>
> >>>> df1$id2[99990:100010]
> >>>> [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"
> >>>> [8] "99997"  "99998"  "99999"  "100000" "100001" "100002" "100003"
> >>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
> >>>
> >>> Some formating processes are carried out by system functions. In this case I am
> unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched
> >>>
> >>>> df1$id2[99990:100010]
> >>>       [1] "99990"  "99991"  "99992"  "99993"  "99994"  "99995"  "99996"  "99997"
> >>>       [9] "99998"  "99999"  "100000" "100001" "100002" "100003" "100004" "100005"
> >>> [17] "100006" "100007" "100008" "100009" "100010"
> >>>
> >>> (I did notice that generation of the id2 variable seemed to take an inordinately long
> time.)
> >>>
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.