[R] RODBC and NULL values

chao gai chaogai at duineveld.demon.nl
Wed Nov 15 10:54:09 CET 2006


Mark,

I think you should realize that SQL is evolved from a different culture, large 
databases. It does not have these fancy data types that we find so usefull in 
R, so somewhere there is guessing what class a variable is.

I tried to examine what NULL values would do on my computer Linux, Suse 9.2, 
MySQL 4.0.21, ODBC ver 03.51. R 2.4 latest packages. As this was a trial I 
added the lot, Inf, -Inf, NA, NaN.
At first my rows with Inf and -Inf were dropped without even a warning. With 
RMySQL it went a bit better, Inf and -Inf were kept. 
In the end, using also the MySQLCC console manager I figured out that one 
could indeed insert the correct values. 
Then I realized: How would RODBC know what my database can handle, there may 
be a zillion different types of ODBC drivers with databases under them. 
Why would the database be so versatile anyway. Inf, -Inf, NA and NaN are not 
exaclty what one would use for say an inventory, personel adminstration or 
the such. These are probably not part of the SQL standard anyway.

So, in short, even though R sometimes seems magically to guess what you want, 
it does not work for reading SQL. The only true magic in R are squares, its 
existence and all those beatifull packages.

Kees

# my code
library(RODBC)
chan <- odbcConnect('myodbc1',uid='kees')

x <- c(0,Inf,-Inf,NaN,NA)
y  <- c(1,2,3,4,5)
df1 <- data.frame(x=x,y=y)

sqlSave(chan, df1, "testNA1", rownames=T,verbose=TRUE,fast=TRUE)
str(sqlFetch(chan,'testNA1'))
'data.frame':   3 obs. of  2 variables:
 $ x: num  0 NA NA
 $ y: num  1 4 5

sqlQuery(chan,'CREATE TABLE testNA2  (rownames varchar(255), x double, y 
double)')
sqlQuery(chan,"INSERT INTO testNA2 ( rownames, x, y ) VALUES ( '1', 0, 1 )")
sqlQuery(chan,"INSERT INTO testNA2 ( rownames, x, y ) VALUES ( '2', 'inf', 
2)")
sqlQuery(chan,"INSERT INTO testNA2 ( rownames, x, y ) VALUES ( '3', '-inf', 
3 )")
sqlQuery(chan,"INSERT INTO testNA2 ( rownames, x, y ) VALUES ( '4', 'nan', 
4 )")
sqlQuery(chan,"INSERT INTO testNA2 ( rownames, x, y ) VALUES ( '5', NULL , 
5 )")

str(sqlFetch(chan,'testNA2'))
'data.frame':   5 obs. of  2 variables:
 $ x: num     0  Inf -Inf  NaN   NA
 $ y: num  1 2 3 4 5
# end of code

On Tuesday 14 November 2006 21:01, Mark Wardle wrote:
> Mark Wardle wrote:
> > Dear all,
> >
> > I'm afraid I'm still having trouble with RODBC and NULL values on Mac OS
> > ... <snip>
> > limited to RODBC, and is not an ODBC driver problem? Any ideas? I'll be
> > switching to RdbiPgSQL from now, but I thought it appropriate to flag
> > this up as an unsolved problem.
>
> Hmmm.. there are several issues with RdbiPgSQL (including not doing as
> good a job at recognising factors for example - this requires manually
> checking data frames returned, and fixing data types with liberal use of
> as.factor() etc..), so if anyone has information on the best way to
> access PostgreSQL from R, then all advice appreciated!
>
> Apart from the NULL value problem, RODBC seems the best and most mature
> and does a great job of inferring data types.  The RPGSQL website
> suggests it is abandoned in favour of Rdbi, Rdbi's website suggests it
> is abandoned in favour of DBI. This doesn't complete support postgresql
> as far as I can see, but they recommend the RdbiPgSQL package on
> bioconductor as above.
>
> Thanks,
>
> Mark



More information about the R-help mailing list