[R] Suprising behavior of paste or cat?

Russell Pierce rpier001 at ucr.edu
Thu Feb 11 19:44:18 CET 2010


Thank you for your input so far r-help denizens.  Neither David nor
Peter were able to replicate my result.  Has anybody other than me
been able to generate the failure I'm describing?  So far I've
experienced it on 3 machines, Windows XP/P4/2.1.10, Windows
XP/Atom/2.1.10/2.1.11(release), Windows Vista/Centrino/2.1.10, but
found no problem on linux/2.7.1/x86_64.

Bill's idea is interesting. There may be a mismatch between types
occurring somewhere, but I haven't exactly where yet.  To test our his
idea, I tried changing the order of the values in my vector "task" so
my output would start off with "2," rather than "1,".  But I did not
observe a change in behavior.

I've generated further sample code to demonstrate the idiosyncraticy
of what I'm observing.

This code segment does not create a failure.
#No failure
lastcomma <- function(x) {return(paste(x,",",collapse="",sep=""))}
h.long <- 150
task1 <- c(rep(1,h.long),rep(2,h.long))
task2<- c(rep(2,h.long),rep(1,h.long))
res1 <- lastcomma(paste(task1,collapse=","))
res2 <- lastcomma(paste(task2,collapse=","))
write(file="write-okay1.txt",res1)
cat(file="cat-okay2.txt",res2)

This code segment, where the task vector is reordered using sample as
an index, creates invalid files.
#Failure of write and cat
ord <- sample(1:(h.long*2))
task1  <- task1[ord]
task2  <- task2[ord]
res1.bad <- lastcomma(paste(task1,collapse=","))
res2.bad <- lastcomma(paste(task2,collapse=","))
write(file="write-bad1.txt",res1.bad)
cat(file="cat-bad2.txt",res2.bad)

This code segment, where the task vector is shorter and reordered,
creates invalid files with cat, but not with write, and only when task
has been passed through my lastcomma function.
#Inconsistent; cat fails but write does not, cat only fails when
string has been passed through lastcomma
h.long <- 100
task1 <- c(rep(1,h.long),rep(2,h.long))
task2<- c(rep(2,h.long),rep(1,h.long))
ord <- sample(1:(h.long*2))
task1  <- task1[ord]
task2  <- task2[ord]
res1.no.lastcomma <- paste(task1,collapse=",")
res2.no.lastcomma <- paste(task2,collapse=",")
res1.yes.lastcomma <- lastcomma(res1.no.lastcomma)
res2.yes.lastcomma <- lastcomma(res2.no.lastcomma)
write(file="write-1-nlc.txt",res1.no.lastcomma) #okay
write(file="write-2-nlc.txt",res2.no.lastcomma) #okay
cat(file="cat-1-nlc.txt",res1.no.lastcomma) #okay
cat(file="cat-2-nlc.txt",res2.no.lastcomma) #okay
write(file="write-1-lc.txt",res1.yes.lastcomma) #okay
write(file="write-2-lc.txt",res2.yes.lastcomma) #okay
cat(file="cat-1-lc.txt",res1.yes.lastcomma) #bad
cat(file="cat-2-lc.txt",res2.yes.lastcomma) #bad

Thanks,

Russell

On Thu, Feb 11, 2010 at 9:05 AM, William Dunlap <wdunlap at tibco.com> wrote:
>
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Russell Pierce
>> Sent: Wednesday, February 10, 2010 9:21 PM
>> To: r-help at r-project.org
>> Subject: [R] Suprising behavior of paste or cat?
>>
>> I may be making a simple error, but I've looked at the str() of the
>> resulting objects and I can't see any obvious reason I'm having the
>> problem I am having, so I am reaching out to the R-help group.  I am
>> generating a string in my code.  When I make a slight modification
>> (add a comma at the end using my "lastcomma" function), I can no
>> longer successfully write that string to a file.  Specifically, the
>> resulting file contains only the "ⰱ" character.
>
> That character (which prints as an unfilled square when
> I look at it in Outlook) is (when I copy and paste it
> to R 2.10.0 on Windows):
>   > "ⰱ"
>   [1] "\u2c31"
> The 2 bytes in it would be comma and one in ascii:
>   > "\x2c"
>   [1] ","
>   > "\x31"
>  [1] "1"
> It looks like a ascii/UTF-8 mismatch.  Is the square Outlook's
> way of saying it is illegal UTF-8?
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>> This occurs in:
>> R version 2.10.0 (2009-10-26) & R version 2.10.1 (2009-12-14)
>> i386-pc-mingw32
>> locale:
>> [1] LC_COLLATE=English_United States.1252
>> [2] LC_CTYPE=English_United States.1252
>> [3] LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>> but not in...
>> R version 2.7.1 (2008-06-23)
>> x86_64-pc-linux-gnu
>> locale:
>> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLA
>> TE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=
>> en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREME
>> NT=en_US.UTF-8;LC_IDENTIFICATION=C
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>> Sample code:
>> h.long <- 150
>> task <- c(rep(1,h.long),rep(2,h.long))
>> ord <- sample(1:length(task))
>> task <- task[ord]
>> taskout <- paste(task,collapse=",")
>> write(file="please.txt",taskout)
>> lastcomma <- function(x) {return(paste(x,",",collapse="",sep=""))}
>> res <- lastcomma(taskout)
>> write(file="fail.txt",res)
>> cat(file="catfail.txt",res)
>>
>> Any ideas as to how to avoid this problem would be appriciated as well
>> as suggestions as to whether this is expected behavior, or whether it
>> ought to be reported as a bug.
>>
>> Best,
>>
>> Russell Pierce
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list