[R] Multibyte strings

David Winsemius dwinsemius at comcast.net
Sat Sep 26 00:20:27 CEST 2015




On Sep 25, 2015, at 2:23 PM, Dennis Fisher wrote:

> R 3.2.0
> OS X
> 
> Colleagues,
> 
> Earlier today, I initiated a series of emails regarding SASxport (which was removed from CRAN).  David Winsemius proposed downloading the source code and installing with the following command:
> 	install.packages('~/Downloads/SASxport_1.5.0.tar.gz', repos = NULL , type="source”)Th
> 
> That works and I am grateful to David for his recommendation.  However, the package fails on some of the many objects that I attempted to write with:
> 	write.xport
> 
> The error message was:
> 	Error in nchar(var) : invalid multibyte string 3157

Consider using traceback() to see what section of code is actually reporting?

Since the error reported in your earlier message indicated a problem with a particular word starting with DIARRH  and ending in æéñåºA. When I try to drop that unquoted into an R console line I get:

> DIARRH¸æéñåºA
Error: unexpected input in "DIARRH¬"

My word process tells me that little comma-like glyph is a cedilla.

However I'm not sure this is reproducible problem since I am unable to produce a similar error with the toy file that is built with the write.xport help page code:

> abc <- data.frame( x=c(1, 2, NA, NA ), y=c('a', 'DIARRH¸æéñåºA', NA, '*' ) )
> abc
   x             y
1  1             a
2  2 DIARRH¸æéñåºA
3 NA          <NA>
4 NA             *
> SASformat(abc$x) <- 'date7.'
> label(abc$y) <- 'character variable'
> label(abc) <- 'Simple example'
> SAStype(abc) <- 'MYTYPE'
> str(abc)
'data.frame':	4 obs. of  2 variables:
 $ x: atomic  1 2 NA NA
  ..- attr(*, "SASformat")= chr "date7."
 $ y: Factor w/ 3 levels "*","a","DIARRH¸æéñåºA": 2 3 NA 1
  ..- attr(*, "label")= chr "character variable"
 - attr(*, "label")= chr "Simple example"
 - attr(*, "SAStype")= chr "MYTYPE"
> write.xport( abc, file="xxx.dat" )
> abc <- data.frame( x=c(1, 2, NA, NA ), y=c('a', 'DIARRH¸æéñåºA', NA, '*' ) )
> abc
   x             y
1  1             a
2  2 DIARRH¸æéñåºA
3 NA          <NA>
4 NA             *
> SASformat(abc$x) <- 'date7.'
> label(abc$y) <- '"DIARRH¸æéñåºA"'
> label(abc) <- 'Simple example'
> SAStype(abc) <- 'MYTYPE'
> str(abc)
'data.frame':	4 obs. of  2 variables:
 $ x: atomic  1 2 NA NA
  ..- attr(*, "SASformat")= chr "date7."
 $ y: Factor w/ 3 levels "*","a","DIARRH¸æéñåºA": 2 3 NA 1
  ..- attr(*, "label")= chr "\"DIARRH¸æéñåºA\""
 - attr(*, "label")= chr "Simple example"
 - attr(*, "SAStype")= chr "MYTYPE"
> write.xport( abc, file="xxx.dat" )


> 
> One work-around would be to edit out multibyte strings.  Is there a simple way to find and replace them?  

On a Mac I have used the Zap Gremlins option in TextWrangler.app. It would change the spelling of words that were originally constructed using ligature characters.


Best of luck;
David.

> Or is there some other clever approach that bypasses the problem?
> 
> Dennis
> 
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone: 1-866-PLessThan (1-866-753-7784)
> Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list