[R] importing data to SQLite database with sqldf

Gabor Grothendieck ggrothendieck at gmail.com
Sat Feb 21 15:17:41 CET 2009


WARNING!!!

Not a good idea to post code that deletes every object
in your workspace!   Other people may blindly copy
and paste your code.

I've added some verbiage to Example 12 on the home
page:
http://sqldf.googlecode.com
that hopefully clarifies it a bit.

On Sat, Feb 21, 2009 at 2:36 AM, Stephen Tucker <brown_emu at yahoo.com> wrote:
> Thanks yet another time, Gabor -
> I think I am slowly understanding - particularly I was confused by persistence of connections.
>
> So starting with some parts of your example 12,
>
> ##

... deleted code which clears workspace ...

> sqldf("attach 'mydb' as new")
> irishead <- file("irishead.dat")
> iristail <- file("iristail.dat")
>
> If I just wanted to merge the two files within SQL and return some part of the result, I would do
>
> sqldf('select count(*) from (select * from irishead
> union
> select * from iristail)',dbname="mydb")
>
> and the tables exist in mydb only for the duration of the computation
>> sqldf("select * from sqlite_master",dbname="mydb")$name
> NULL
> (but why is the size of mydb > 0 afterwards, if it contains no tables...?)
>
> ...is the above the same as
> sqldf('select count(*) from (select * from irishead
> union
> select * from iristail)',dbname=tempfile())
>
> except that I don't create 'mydb'?

Yes.  The third possibility is to omit dbname= entirely and then it uses
a temporary "in memory" database.

>
> If I wanted to save the merged table (for use in a later session):
>
> sqldf('create table fulliris as select * from irishead
> union
> select * from iristail',dbname="mydb")
>
>> sqldf("select * from sqlite_master",dbname="mydb")$name
> [1] fulltable
> Levels: fulltable
>
> If I want copies of all three tables,
> sqldf(dbname="mydb")
> sqldf('create table fulltable as select * from irishead
> union
> select * from iristail')
> sqldf()
>
>> sqldf("select * from sqlite_master",dbname="mydb")$name
> [1] irishead  iristail  fulltable
> Levels: fulltable irishead iristail
>
> ? ...I'll try to go figure a few more things out in the in the meantime (like using sep="\t" ?) and using connections with sqldf().
>
> But thanks for the help!
>
> Stephen
>
> ----- Original Message ----
> From: Gabor Grothendieck <ggrothendieck at gmail.com>
> To: Stephen Tucker <brown_emu at yahoo.com>
> Cc: R-help <r-help at stat.math.ethz.ch>
> Sent: Friday, February 20, 2009 5:22:09 AM
> Subject: Re: [R] importing data to SQLite database with sqldf
>
> Have just added an example 12 on the home page:
>
> http://sqldf.googlecode.com
>
> that shows an example.  Note use of notation
> main.mytable to refer to an existing table in the
> main database (as opposed to a data frame in R).
>
> On Thu, Feb 19, 2009 at 11:55 PM, Stephen Tucker <brown_emu at yahoo.com> wrote:
>> Hi all,
>>
>> I am attempting to learn SQL through sqldf...
>>
>> One task I am particularly interested in is merging separate
>> (presumably large) files into a single table without loading these
>> files into R as an intermediate step (by loading them into SQLite and
>> merging them there).
>>
>> Taking a step back, I've considered these alternatives:
>>
>> 1) I know if I use straight SQLite commands I might use the 'IMPORT'
>> or 'INSERT INTO' command, which is not terribly flexible...  (I can
>> read large files line-by-line in Python and use the 'INSERT INTO'
>> command, which is reasonably fast; I could do this in R as well but my
>> experience with R's input/output is that it's much slower...? and
>> sometimes setting up the table column definitions can be tedious if
>> there are many variables).
>>
>> 2) dbWriteTable() with append=TRUE is very convenient except that it
>> requires I load the data into R first...
>>
>> 3) sqldf's capability to put data directly into a database is
>> something I'd like to work out.
>>
>> So in this case I have a series of tab-delimited text file with the
>> first line being a header.
>>
>> For some reason I cannot seem to get it working.  Combining examples 6
>> and 9 from the Google Code page (and R-help archives), I tried
>>
>> source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R")
>> (do I need it for SQLite?)
>> ##
>> sqldf("attach 'mydb.db' as new")
>> f <- file("myexample.txt")
>> attr(f,"file.format") <- list(header=TRUE,sep="\t")
>> sqldf("create table myexample as select * from f",
>>      stringsAsFactors=FALSE,
>>      dbname="mydb.db")
>> ## or
>> f <- file(fi)
>> sqldf("create table myexample as select * from f",
>>      stringsAsFactors=FALSE,file.format=list(header=TRUE,sep="\t"),
>>      dbname="mydb.db")
>> ##
>> sqldf("select * from myexample",dbname="mydb.db")
>> gives me tables with 0 rows and 0 columns...
>>
>> So in any case I have a few questions:
>>
>> === 1 ====
>> Would this be scalable to files with few GBs of data in them (I guess
>> I am uncertain of the underlying mechanism for transporting data from
>> the .txt file to the .db file... I see there is a call the
>> dbWriteTable() internally in sqldf but through the connection)?  And
>> is there anything obviously doing wrong above?
>>
>> === 2 ===
>> Since I cannot 'append' rows to existing tables in SQLite (or any SQL
>> database), I think what I would have to do is to load each of the
>> files into the database and then 'rbind' them into a new table using
>> 'union all'? The following code is what I have (the part where I read
>> in each file before dumping into the database file would hopefully be
>> replaced by the method above, if it can be made to work).
>>
>> ## (1) create a database file
>> sqldf("attach 'alltables.db' as new")
>> ## (2) convenience function
>> sql <- function(...) sqldf(...,dbname="alltables.db")
>> ## (3) load data as separate tables
>> for( fi in list.files(".","txt$") ) {
>>  mydata <- read.delim(fi)
>>  sql(sprintf("create table %s as select * from mydata",sub("\\.txt","",fi)))
>> }
>> rm(fi,mydata)
>> ## (4) merge tables
>> tablenames <- as.character(sql("select * from sqlite_master")$name)
>> sql(paste("create table myfulltable as",
>>          paste(sprintf("select * from %s",tablenames),
>>                collapse=" union all ")))
>> ## (5) delete separate tables since we have a merged one
>> for( nm in tablenames ) sql(sprintf("drop table %s",nm))
>>
>> === 3 ===
>> The following command
>> sqldf("attach 'mydb.db' as new")
>> DF <- read.delim(fi)
>> sqldf("create table myexample as select * from DF",dbname="mydb.db")
>>
>> will usually create a .db file twice the size of the .txt file (for
>> now I am playing with a files ~500KB so the .db files are around
>> ~1MB). When I create a .db file using SQLite's 'import' command,
>> RSQLite's dbWriteTable(), or inserting values from the same .txt file
>> from Python's SQLite interface, I get .db files that are approximately
>> the same size as the .txt file (~500KB). Is the larger file size for
>> sqldf's method expected?
>>
>> Many thanks in advance!
>>
>> Stephen Tucker
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
>




More information about the R-help mailing list