[R] Q: Suggestions for long-term data/program storage policy?

Sean Davis sdavis2 at mail.nih.gov
Tue Oct 11 13:11:14 CEST 2005


On 10/11/05 6:54 AM, "Duncan Murdoch" <murdoch at stats.uwo.ca> wrote:

> Alexander Ploner wrote:
>> Dear list,
>> 
>> we are a statistical/epidemiological departement that - after a few
>> years of rapid growth - finally is getting around to formulate a
>> general data storage and retention policy - mainly to ensure that we
>> can reproduce results from published papers/theses easier in the
>> future, but also with the hope that we get more synergy between
>> related projects.
>> I would also be very grateful for any other suggestions, comments or
>> links for setting up and implementing such a storage policy (R-
>> specific or otherwise).

I would also consider a relational database (such as mysql or postgres) for
your data warehousing.  These products (particularly postgres) are designed
with data integrity first-and-foremost.  Data formats can change over time,
but the data can be easily extracted from the database to match the needs at
hand.  Data generated at different times can be easily mined and combined as
needed.  The data backup process is fairly straightforward.  R already
integrates with several relational database systems, so an integrated
solution can be defined if one so desires.  Look at RMySQL, Rdbi, and
RdbiPgSQL for how to integrate R with MySQL and Postgres.

Sean




More information about the R-help mailing list