[R] sqldf for Very Large Tab Delimited Files

HC hcatbr at yahoo.co.in
Fri Feb 3 02:07:24 CET 2012


Hi Gabor,

Thank you very much for your guidance and help.

I could run the following code successfully on a 500 mb test data file. A
snapshot of the data file is attached herewith.

********code start***************
library(sqldf)
library(RSQLite)

iFile<-"Test100.txt"
con <- dbConnect(SQLite(),dbname = "myTest100")
dbWriteTable(con, "TestDB100", iFile, sep = "\t") #, eol = "\r\n")
nms <- names(dbGetQuery(con, "select * from TestDB100 limit 0"))

nRec<-fn$dbGetQuery(con, "select count(*)from TestDB100")
aL1<-1;

while (aL1<=nRec){
res1<-fn$dbGetQuery(con, "select * from (select * from TestDB100 limit
'$aL1',1)")
istn<-res1[1,1]
res1<-fn$dbGetQuery(con, "select * from TestDB100 where `nms[1]` = '$istn'")
icount<-dim(res1)[1]
oFile<-paste(istn,"_Test.txt",sep="")
write.table(res1, oFile, sep = "\t", quote = FALSE, col.names= FALSE,
row.names = FALSE)
aL1<-aL1+icount
}
dbDisconnect(con)
********code end***************

However, the actual data file that I want to handle is about *160 GB*. And
when I use the same above code on that file, it gives following error for
dbWriteTable(con, ...) statement
********error start**************
dbWriteTable(con, "TestDB", iFile, sep = "\t") #, eol = "\r\n")
Error in try({ : RS-DBI driver: (RS_sqlite_getline could not realloc)
[1] FALSE
********error end**************

I am not sure about the reason of this error. Is this due to the big file
size? I understood from sqldf webpage that SQLite can work for even a larger
file than this and is only restricted by the disc space and not RAM. I have
about 400GB free space on the PC I am using, with Windows 7 as the operating
system. I am assuming that the about dbWriteTable command is using the disc
memory only and is not the issue.

In fact this file has been created using MySQLdump and I do not have access
to the original MYSQL database file. 
I want to know the following:
(1)  Am I missing something in the above code that is preventing handling of
this big 160  GB file?
(2)  Should this be handled outside of R, if R is becoming a limitation in
this? And if yes then what is a possible way forward?

Thank you again for your quick response and all the help.
HC
http://r.789695.n4.nabble.com/file/n4353362/Test100.txt Test100.txt 
 





--
View this message in context: http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4353362.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list