[Rd] match function causing bad performance when using tablefunction on factors with multibyte characters on Windows

Karl Ove Hufthammer karl at huftis.org
Tue Jan 25 11:49:11 CET 2011


Matthew Dowle wrote:

> I'm not sure, but note the difference in locale between
> Linux (UTF-8) and Windows (non UTF-8). As far as I
> understand it R much prefers UTF-8, which Windows doesn't
> natively support. Otherwise you could just change your
> Windows locale to a UTF-8 locale to make R happier.
> 
[...]
> 
> If anybody knows a way to trick R on Linux into thinking it has
> an encoding similar to Windows then I may be able to take a
> look if I can reproduce the problem in Linux.

Changing the locale to an ISO 8859-1 locale, i.e.:

export LC_ALL="en_US.ISO-8859-1"
export LANG="en_US.ISO-8859-1"

I could *not* reproduce it; that is, ‘table’ is as fast on the non-ASCII 
factor as it is on the ASCII factor.

-- 
Karl Ove Hufthammer



More information about the R-devel mailing list