[R] managing large datasets with RMySQL

Tamas K Papp tpapp at Princeton.EDU
Tue Aug 9 18:13:52 CEST 2005


I have a large dataset (about 1 million data points from a
68-dimensional state space, result of an MCMC simulation) which won't
fit in memory.  I think that the only solution for analyzing this is
saving it in relational database (when generated) and then reading
back only portions of this data.

I have installed & initialized MySQL and the RMySQL package (I know
nothing about SQL, unfortunately, but I will try to learn).  The code
from section 4.3.1 of the R Data Import/Export manual runs
successfully.

Questions:

1. should I use dbWriteTable(..., overwrite=FALSE, append=TRUE) for
repeatedly saving the chunks of data?

2. is it OK to make row.names=FALSE when writing?

3. how do I  retrieve only parts of the  data? dbReadTable returns the
whole thing if I understand correctly.

If somebody has written code for analyzing data in parts before, I
would appreciate if he could send it.

Thanks,

Tamas




More information about the R-help mailing list