[R] Multibyte strings

peter dalgaard pdalgd at gmail.com
Sat Sep 26 11:52:34 CEST 2015


Dennis,

The invalid multibyte issue is almost certainly a symptom of being in a UTF-8 locale and trying to handle strings that aren't in UTF-8. (UTF uses particular 8 bit patterns to say that the following k bytes contain a Unicode value outside ASCII, other "8 bit ASCII" encodings, like Latin-1, just use the extra 128 character codes for special characters. Treating the latter as the former causes errors, the other way around just looks weird.

So perhaps you should try diddling your locale settings and/or look for encoding arguments for the functions that you use. Then again, the XPT format may not be happy with non-ASCII characters, whatever the encoding, in which case you may need to massage the input data sets and change variable names and factor labels (iconv() should be your friend).

By the way, I don't think the FDA "requests" XPT files. As far as I recall, they say somewhere that they _accept_ them (possibly defending themselves against the platform-specific SAS files that once abunded), but I think even Excel goes for submissions - the important thing is that they can get at the actual data reasonably easy. I can see the attraction of taking the well-trodden path, though.

-pd

> On 25 Sep 2015, at 23:23 , Dennis Fisher <fisher at plessthan.com> wrote:
> 
> R 3.2.0
> OS X
> 
> Colleagues,
> 
> Earlier today, I initiated a series of emails regarding SASxport (which was removed from CRAN).  David Winsemius proposed downloading the source code and installing with the following command:
> 	install.packages('~/Downloads/SASxport_1.5.0.tar.gz', repos = NULL , type="source”)Th
> 
> That works and I am grateful to David for his recommendation.  However, the package fails on some of the many objects that I attempted to write with:
> 	write.xport
> 
> The error message was:
> 	Error in nchar(var) : invalid multibyte string 3157
> 
> One work-around would be to edit out multibyte strings.  Is there a simple way to find and replace them?  Or is there some other clever approach that bypasses the problem?
> 
> Dennis
> 
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone: 1-866-PLessThan (1-866-753-7784)
> Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list