[Rd] Support writing UTF-8 output in Windows

Philippe Grosjean phgrosjean at sciviews.org
Sun Nov 10 22:00:16 CET 2013


Here is a quick (and largely untested, apart from the toy example) hack that should read and write UTF-8 encoded CSV files on Window. It is *not* the best solution, which should use proper C-level code. But it could help a bit in your case.
Best,

Philippe Grosjean


read.tableUTF8 <- function (file, ...)
{
	if (l10n_info()$`UTF-8`) {
		read.table(file = file, fileEncoding = "UTF-8", ...)
	} else {
		res <- read.table(file = file, ...) # Read in default encoding
		## For each character variable, change encoding to 'UTF-8'
		## For each factor, change encoding to 'UTF-8'
		as.data.frame(lapply(res, function (x) switch(data.class(x),
			character = {Encoding(x) <- "UTF-8"; x},
			factor = {Encoding(levels(x)) <- "UTF-8"; x},
			x))
		)
	}
}

write.tableUTF8 <- function (x, file = "", ...)
{
	if (l10n_info()$`UTF-8`) {
		write.table(x = x, file = file, fileEncoding = "UTF-8", ...)
	} else {
		## Change encoding to "bytes"  and save it like that
		x <- lapply(x, function (x) {
			if (is.character(x)) {
				Encoding(x) <- "bytes"
			} else if (is.factor(x)) {
				Encoding(levels(x)) <- "bytes"
			}
			x
		})
		write.table(x = x, file = file, ...)
	}
}

fact <- factor(c("\u0444", "\u220F", "\u2030"))
char <- c("\u2202x", "\u2202y", "\u2202z")
dfr <- data.frame(x = 1:3, f = fact, s = I(char))
dfr

write.tableUTF8(dfr, file = "testUTF8.txt")
dfr2 <- read.tableUTF8("testUTF8.txt")
dfr2$s <- I(as.character(dfr2$s))
dfr2

identical(dfr$f, dfr2$f)
identical(dfr$s, dfr2$s)



More information about the R-devel mailing list