[Rd] Suggestion: Create On-Disk Dataframes

Juan Telleria jtelleriar at gmail.com
Sun Sep 3 20:38:11 CEST 2017


Dear R Developers,

I would like to suggest the creation of a new S4 object class for On-Disk
data.frames which do not fit in RAM memory, which could be called
disk.data.frame()

It could be based in rsqlite for example (By translating R syntax to SQL
syntax for example), and the syntax and way of working of the
disk.data.frame() class could be exactly the same than with data.frame
objects.

When the session is of, is such disk.data.frames are not saved, and
implicit DROP TABLE could be done in all the schemas created in rsqlite.

Nowadays, with the SSD disk drives such new data.frame() class could have
sense, specially when dealing with Big Data.

It is true that this new class might be slower than regular data.frame,
data.table or tibble classes, but we would be able to handle much more
data, even if it is at cost of speed.

Also with data sampling, and use of a regular odbc connection we could do
all the work, but for people who do not know how to use RDBMS or specific
purpose R packages for this job, this could work.

Another option would be to base this new S4 class  on feather files, but
maybe making it with rsqlite is simply easier.

A GitHub project could be created for such purpose, so that all the
community can contribute (included me :D ).

Thank you,
Juan

	[[alternative HTML version deleted]]



More information about the R-devel mailing list