[R] Database abstraction

Rob Paul Tirrell rpt at stanford.edu
Mon Feb 28 10:58:26 CET 2011


I was also wondering about precedents or similar packages in the R ecosytem. I'm familiar with R-ORM and SQLDF, which seem closest. 
At this point any and all feedback and critiques are extremely welcome! I imagine other frequent database users have had problems much the same as mine.
I'd like to release this in the near future. I've been using Roxygen for documentation, but support for annotations on R5 methods isn't great...

Thanks all.

----- Original Message -----
From: "Rob Tirrell" <rpt at stanford.edu>
To: r-help at r-project.org
Sent: Sunday, February 27, 2011 2:38:18 AM
Subject: [R] Database abstraction

Hi all -

I've been working on a lite ORM and database abstraction package for R.
Formatting complex queries by hand has always been an error-prone hassle, so
I've tried to do away with that as much as possible, instead, using R
objects to represent elements of a database system (statements, clauses,
operators, functions, fields, tables, etc.). R5 classes have made the
development of this package somewhat easier than it would have been in the
past (in my view, at least).

Then an expression like:
query(some_table$some_field)$where(some_table$some_field != NA &
some_table$some_other_field %in% c(1, 2, 3))
would generate SQL like:
SELECT `database`.`some_table`.`some_field` AS `some_field` WHERE
`database`.`some_table`.`some_field` IS NOT NULL AND
`database`.`some_table`.`some_other_field` IN (1, 2, 3);

The basic features:
- Abstraction of tables and fields to R5 objects and use of R functions
where a clear parallel in SQL exists (e.g. %in% >- IN, unique >- DISTINCT).
- Result proxying (by a data.frame-like object), which allows on-demand
fetching and live updates to results (where appropriate).
- Session management.

It's my feeling that most R users care more about using existing tables than
creating new ones, so introspection is more of a focus than is defining new
relations.

It's reasonably far along, and I've been using it for my own projects for
some time. However, certainly much syntax is not supported, and I've only
written a compiler for MySQL. At this point I'd welcome anyone interesting
in contributing or offering advice. The code is (I think) designed as to
support the addition of more complex logic at any stage (generation,
preparation, compilation, formatting) withouth becoming spaghettized. I've
written a reasonably-sized test suite that covers most of the functionality.

The code is hosted on github - https://github.com/rtirrell/databasr. Feel
free to fork, critique and/or use. I've taken some pains to ensure that
interface will remain relatively stable.

Thanks!

--
Robert Tirrell | rpt at stanford.edu | (607) 437-6532
Program in Biomedical Informatics | Butte Lab | Stanford University

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list