[R] dbi, rodbc, rmysql, charset problem

Prof Brian Ripley ripley at stats.ox.ac.uk
Sun Feb 18 16:09:37 CET 2007


You seem never to have told R or us what charset these data are in. I 
think it is likely that they are being transferred in latin2 (like your 
email), and you are running R in UTF-8 according to Sys.getlocale.  So 
what you need to do is to either

1) Run R in latin2

or

2) use iconv() to convert the results from latin2 to UTF-8.

UTF-8 is relatively new in the DBMS world.  For ODBC, look at the bug 
reports on the MySQL site.  Using RODBC with UTF-8 locales is on my TODO 
list, but of no urgency at all.


On Sun, 18 Feb 2007, janek0 wrote:

> Dear List
>
> In my short life as a beginning R-user i've encountered a following
> problem that i'm unable to solve myself:
>
> I have a database in MySQL containing table and field names as well as
> some data containing Polish accentuated characters (like ????),
> utf8-encoded. It works just fine with just any external query browser i
> can find, jdbc, odbc, native, whatever. Also mysql is happy about my
> charset. It also seems to be configured correctly:
>
> mysql> status
> --------------
> mysql  Ver 14.12 Distrib 5.0.24a, for pc-linux-gnu (i486) using readline
> 5.1
>
> Connection id:          16
> Current database:
> Current user:           root at localhost
> SSL:                    Not in use
> Current pager:          stdout
> Using outfile:          ''
> Using delimiter:        ;
> Server version:         5.0.24a-Debian_9-log
> Protocol version:       10
> Connection:             Localhost via UNIX socket
> Server characterset:    utf8
> Db     characterset:    utf8
> Client characterset:    utf8
> Conn.  characterset:    utf8
> UNIX socket:            /var/run/mysqld/mysqld.sock
> Uptime:                 1 hour 37 min 17 sec
>
> Threads: 2  Questions: 240  Slow queries: 0  Opens: 175  Flush tables: 1
> Open tables: 64  Queries per second avg: 0.041
>
> Yet if i use R's RODBC or RMySQL to connect to my database i can't see
> these accentuated characters:
>
> library(RODBC)
> con <-odbcConnect("trybunal", uid="root", pwd="mypassword")
> sqlTable(con)
>
> the output (abridged) is like that:
>
> TABLE_CAT TABLE_SCHEM                    TABLE_NAME TABLE_TYPE
> REMARKS
> 14  trybunal             Wyk?adnia innych przepis<f3>w      TABLE MySQL
>
> instead of <F3> i should see "ó" and instead of ? a "?".
>
> It is just the same if i use RMySQL instead of RODBC:
>
> library(RMySQL)
> con <-dbConnect(dbDriver("MySQL"), dbname="trybunal", username="root",
> password="mypassword")
> dbListTables(con)
>
> the output (abridged) is like that:
> [13] "Ustawa"                        "Wyk?adnia innych przepis<f3>w"
>
> and if i use
> dbReadTable(con, "Metryczka")
>
> ("Metryczka" being one table in the database) i get:
>
> Error in make.names(as.character(names), allow_) :
>        invalid multibyte string 11
>
> It works without error if i set LC_ALL to "C", but obviously without
> Polish charset.
>
> Strange thing is that Sys.getlocale() gives me
>
> [1]"LC_CTYPE=pl_PL.UTF-8;LC_NUMERIC=C;LC_TIME=pl_PL.UTF-8;LC_COLLATE=pl_PL.UTF-8;
> LC_MONETARY=pl_PL.UTF-8;LC_MESSAGES=pl_PL.UTF-8;
> LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C"
>
> So it is utf8 all over the place. Also R works just fine with
> read.table() if the table contains utf8-encoded chars. Thus the problem
> is just with R-mysql connection. It seems therefore that dbi package
> does not support non-ascii charsets.
>
> Questions:
> 1. Is above conclusion correct or am i doing something wrong ?
> 2. If it is correct, is there any way to use table and field names as
> they are now (with non-ascii chars) in my SQL queries (e.g. SELECT
> `Wyk?adnia przedmiotu kontroli`.*) ?
>
> I can live with Polish characters missing in the output if i have to but
> i must address the database fields/tables. I can't change their names of
> as this would mean rebuilding database frontend. I can't import data to
> R by exporting the database and then importing it via read.table because
> i want a "live" application and must keep things simple.
>
> Any help will be greatly appreciated.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-help mailing list