[R] Suprising behavior of paste or cat?

William Dunlap wdunlap at tibco.com
Thu Feb 11 18:05:51 CET 2010



Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Russell Pierce
> Sent: Wednesday, February 10, 2010 9:21 PM
> To: r-help at r-project.org
> Subject: [R] Suprising behavior of paste or cat?
> 
> I may be making a simple error, but I've looked at the str() of the
> resulting objects and I can't see any obvious reason I'm having the
> problem I am having, so I am reaching out to the R-help group.  I am
> generating a string in my code.  When I make a slight modification
> (add a comma at the end using my "lastcomma" function), I can no
> longer successfully write that string to a file.  Specifically, the
> resulting file contains only the "ⰱ" character.

That character (which prints as an unfilled square when
I look at it in Outlook) is (when I copy and paste it
to R 2.10.0 on Windows): 
   > "ⰱ"
   [1] "\u2c31"
The 2 bytes in it would be comma and one in ascii:
   > "\x2c"
   [1] ","
   > "\x31"
  [1] "1"
It looks like a ascii/UTF-8 mismatch.  Is the square Outlook's
way of saying it is illegal UTF-8?

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> This occurs in:
> R version 2.10.0 (2009-10-26) & R version 2.10.1 (2009-12-14)
> i386-pc-mingw32
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> but not in...
> R version 2.7.1 (2008-06-23)
> x86_64-pc-linux-gnu
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLA
> TE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=
> en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREME
> NT=en_US.UTF-8;LC_IDENTIFICATION=C
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> Sample code:
> h.long <- 150
> task <- c(rep(1,h.long),rep(2,h.long))
> ord <- sample(1:length(task))
> task <- task[ord]
> taskout <- paste(task,collapse=",")
> write(file="please.txt",taskout)
> lastcomma <- function(x) {return(paste(x,",",collapse="",sep=""))}
> res <- lastcomma(taskout)
> write(file="fail.txt",res)
> cat(file="catfail.txt",res)
> 
> Any ideas as to how to avoid this problem would be appriciated as well
> as suggestions as to whether this is expected behavior, or whether it
> ought to be reported as a bug.
> 
> Best,
> 
> Russell Pierce
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


More information about the R-help mailing list