[R] Q: Suggestions for long-term data/program storage policy?

Duncan Murdoch murdoch at stats.uwo.ca
Tue Oct 11 12:54:32 CEST 2005


Alexander Ploner wrote:
> Dear list,
> 
> we are a statistical/epidemiological departement that - after a few  
> years of rapid growth - finally is getting around to formulate a  
> general data storage and retention policy - mainly to ensure that we  
> can reproduce results from published papers/theses easier in the  
> future, but also with the hope that we get more synergy between  
> related projects.
> 
> We have formulated what we feel is a reasonable draft, requiring  
> basically that the raw data, all programs to create derived data  
> sets, and the analysis programs are stored and documented in a  
> uniform manner, regardless of the analysis software used. The minimum  
> data retention we are aiming for is 10 years, and the format for the  
> raw data is quite sane (either flat ASCII or real
> 
> Given the rapid devlopment cycle of R, this suggests that at the very  
> least all non-base packages used in the analysis are stored together  
> with each project. I have basically two questions:
> 
> 1) Are old R versions (binaries/sources) going to be available on  
> CRAN indefinitely?

I think sources will be, binaries much less reliably.  (I just 
discovered that one or two of the old Windows binaries are corrupted; 
I'm not sure I'll be able to find good copies.)

> 2) Is .RData a reasonable file format for long term storage?

I think the intention is that it will be supported in future versions of 
R, but storing data in a binary format is risky.  What if you don't use 
R in 5 years?  You would find it a lot easier to decode text format 
files in another package than .RData format.

The other advantage of text format is that it works very well with 
version control systems like Subversion or CVS.  You can see several 
versions of the file, see comments on why changes were made, etc.

Duncan Murdoch
> 
> I would also be very grateful for any other suggestions, comments or  
> links for setting up and implementing such a storage policy (R- 
> specific or otherwise).
> 
> Thank you for your time,
> 
> alexander
> 
> 
> Alexander.Ploner at meb.ki.se
> Medical Epidemiology & Biostatistics
> Karolinska Institutet, Stockholm
> Tel: ++46-8-524-82329
> Fax: ++46-8-31 49 75
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




More information about the R-help mailing list