[Rd] match function causing bad performance when usingtablefunction on factors with multibyte characters on Windows

Matthew Dowle mdowle at mdowle.plus.com
Tue Jan 25 18:31:03 CET 2011


I don't know if that's enough to flip the UTF8 switches
internally in R. If it is enough, then this result may show
I'm barking up the wrong tree. Hopefully someone from
core is watching who knows. Is it feasible that you run
R using an alias, and for some reason the alias is not
picking up your shell variables. Best to rule that out now
by running sessionInfo() at the R prompt.

Otherwise do you know profiling tools sufficiently to trace the
problem at the C level as it runs on Windows?

Matthew

"Karl Ove Hufthammer" <karl at huftis.org> wrote in message 
news:ihm9qq$9ej$1 at dough.gmane.org...
> Matthew Dowle wrote:
>
>> I'm not sure, but note the difference in locale between
>> Linux (UTF-8) and Windows (non UTF-8). As far as I
>> understand it R much prefers UTF-8, which Windows doesn't
>> natively support. Otherwise you could just change your
>> Windows locale to a UTF-8 locale to make R happier.
>>
> [...]
>>
>> If anybody knows a way to trick R on Linux into thinking it has
>> an encoding similar to Windows then I may be able to take a
>> look if I can reproduce the problem in Linux.
>
> Changing the locale to an ISO 8859-1 locale, i.e.:
>
> export LC_ALL="en_US.ISO-8859-1"
> export LANG="en_US.ISO-8859-1"
>
> I could *not* reproduce it; that is, 'table' is as fast on the non-ASCII
> factor as it is on the ASCII factor.
>
> -- 
> Karl Ove Hufthammer
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list