[R] Suprising behavior of paste or cat?

Russell Pierce rpier001 at ucr.edu
Thu Feb 11 21:58:00 CET 2010


Great thought Duncan,

I've been examining the resulting files using the default installed
notepad.exe.  I just opened a "nonsense" file in wordpad and the text
is viewable.  The text must be getting converted somewhere.  However,
whatever conversion is occurring must be inconsistent otherwise all of
the files would be written in the same format.

Best,

Russell

On Thu, Feb 11, 2010 at 10:55 AM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> I don't think you have said how you are examining the output files.  Is it
> possible that your text editor is assuming that the files are UCS-2
> (Unicode), even
> though R is writing ASCII?
>
> Duncan Murdoch
>
> On 11/02/2010 1:44 PM, Russell Pierce wrote:
>>
>> Thank you for your input so far r-help denizens.  Neither David nor
>> Peter were able to replicate my result.  Has anybody other than me
>> been able to generate the failure I'm describing?  So far I've
>> experienced it on 3 machines, Windows XP/P4/2.1.10, Windows
>> XP/Atom/2.1.10/2.1.11(release), Windows Vista/Centrino/2.1.10, but
>> found no problem on linux/2.7.1/x86_64.
>>
>> Bill's idea is interesting. There may be a mismatch between types
>> occurring somewhere, but I haven't exactly where yet.  To test our his
>> idea, I tried changing the order of the values in my vector "task" so
>> my output would start off with "2," rather than "1,".  But I did not
>> observe a change in behavior.
>>
>> I've generated further sample code to demonstrate the idiosyncraticy
>> of what I'm observing.
>>
>> This code segment does not create a failure.
>> #No failure
>> lastcomma <- function(x) {return(paste(x,",",collapse="",sep=""))}
>> h.long <- 150
>> task1 <- c(rep(1,h.long),rep(2,h.long))
>> task2<- c(rep(2,h.long),rep(1,h.long))
>> res1 <- lastcomma(paste(task1,collapse=","))
>> res2 <- lastcomma(paste(task2,collapse=","))
>> write(file="write-okay1.txt",res1)
>> cat(file="cat-okay2.txt",res2)
>>
>> This code segment, where the task vector is reordered using sample as
>> an index, creates invalid files.
>> #Failure of write and cat
>> ord <- sample(1:(h.long*2))
>> task1  <- task1[ord]
>> task2  <- task2[ord]
>> res1.bad <- lastcomma(paste(task1,collapse=","))
>> res2.bad <- lastcomma(paste(task2,collapse=","))
>> write(file="write-bad1.txt",res1.bad)
>> cat(file="cat-bad2.txt",res2.bad)
>>
>> This code segment, where the task vector is shorter and reordered,
>> creates invalid files with cat, but not with write, and only when task
>> has been passed through my lastcomma function.
>> #Inconsistent; cat fails but write does not, cat only fails when
>> string has been passed through lastcomma
>> h.long <- 100
>> task1 <- c(rep(1,h.long),rep(2,h.long))
>> task2<- c(rep(2,h.long),rep(1,h.long))
>> ord <- sample(1:(h.long*2))
>> task1  <- task1[ord]
>> task2  <- task2[ord]
>> res1.no.lastcomma <- paste(task1,collapse=",")
>> res2.no.lastcomma <- paste(task2,collapse=",")
>> res1.yes.lastcomma <- lastcomma(res1.no.lastcomma)
>> res2.yes.lastcomma <- lastcomma(res2.no.lastcomma)
>> write(file="write-1-nlc.txt",res1.no.lastcomma) #okay
>> write(file="write-2-nlc.txt",res2.no.lastcomma) #okay
>> cat(file="cat-1-nlc.txt",res1.no.lastcomma) #okay
>> cat(file="cat-2-nlc.txt",res2.no.lastcomma) #okay
>> write(file="write-1-lc.txt",res1.yes.lastcomma) #okay
>> write(file="write-2-lc.txt",res2.yes.lastcomma) #okay
>> cat(file="cat-1-lc.txt",res1.yes.lastcomma) #bad
>> cat(file="cat-2-lc.txt",res2.yes.lastcomma) #bad
>>
>> Thanks,
>>
>> Russell
>>
>> On Thu, Feb 11, 2010 at 9:05 AM, William Dunlap <wdunlap at tibco.com> wrote:
>> >
>> >
>> > Bill Dunlap
>> > Spotfire, TIBCO Software
>> > wdunlap tibco.com
>> >
>> >> -----Original Message-----
>> >> From: r-help-bounces at r-project.org
>> >> [mailto:r-help-bounces at r-project.org] On Behalf Of Russell Pierce
>> >> Sent: Wednesday, February 10, 2010 9:21 PM
>> >> To: r-help at r-project.org
>> >> Subject: [R] Suprising behavior of paste or cat?
>> >>
>> >> I may be making a simple error, but I've looked at the str() of the
>> >> resulting objects and I can't see any obvious reason I'm having the
>> >> problem I am having, so I am reaching out to the R-help group.  I am
>> >> generating a string in my code.  When I make a slight modification
>> >> (add a comma at the end using my "lastcomma" function), I can no
>> >> longer successfully write that string to a file.  Specifically, the
>> >> resulting file contains only the "ⰱ" character.
>> >
>> > That character (which prints as an unfilled square when
>> > I look at it in Outlook) is (when I copy and paste it
>> > to R 2.10.0 on Windows):
>> >   > "ⰱ"
>> >   [1] "\u2c31"
>> > The 2 bytes in it would be comma and one in ascii:
>> >   > "\x2c"
>> >   [1] ","
>> >   > "\x31"
>> >  [1] "1"
>> > It looks like a ascii/UTF-8 mismatch.  Is the square Outlook's
>> > way of saying it is illegal UTF-8?
>> >
>> > Bill Dunlap
>> > Spotfire, TIBCO Software
>> > wdunlap tibco.com
>> >
>> >> This occurs in:
>> >> R version 2.10.0 (2009-10-26) & R version 2.10.1 (2009-12-14)
>> >> i386-pc-mingw32
>> >> locale:
>> >> [1] LC_COLLATE=English_United States.1252
>> >> [2] LC_CTYPE=English_United States.1252
>> >> [3] LC_MONETARY=English_United States.1252
>> >> [4] LC_NUMERIC=C
>> >> [5] LC_TIME=English_United States.1252
>> >> attached base packages:
>> >> [1] stats     graphics  grDevices utils     datasets  methods   base
>> >> but not in...
>> >> R version 2.7.1 (2008-06-23)
>> >> x86_64-pc-linux-gnu
>> >> locale:
>> >> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLA
>> >> TE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=
>> >> en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREME
>> >> NT=en_US.UTF-8;LC_IDENTIFICATION=C
>> >> attached base packages:
>> >> [1] stats     graphics  grDevices utils     datasets  methods   base
>> >> Sample code:
>> >> h.long <- 150
>> >> task <- c(rep(1,h.long),rep(2,h.long))
>> >> ord <- sample(1:length(task))
>> >> task <- task[ord]
>> >> taskout <- paste(task,collapse=",")
>> >> write(file="please.txt",taskout)
>> >> lastcomma <- function(x) {return(paste(x,",",collapse="",sep=""))}
>> >> res <- lastcomma(taskout)
>> >> write(file="fail.txt",res)
>> >> cat(file="catfail.txt",res)
>> >>
>> >> Any ideas as to how to avoid this problem would be appriciated as well
>> >> as suggestions as to whether this is expected behavior, or whether it
>> >> ought to be reported as a bug.
>> >>
>> >> Best,
>> >>
>> >> Russell Pierce
>> >>
>> >> ______________________________________________
>> >> R-help at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>



More information about the R-help mailing list