[R] Archive format

Joe Gain joe.gain at uni-konstanz.de
Thu Mar 30 10:14:51 CEST 2017


On 29.03.2017 17:36, Jeff Newmiller wrote:
> The relevance to R (and therefore R-help) of this question is marginal at best. R might not be the language of choice when you go retrieve the data.
>
> Also, this question seems dangerously close to a troll, because the obvious answer is that the data should be in an open format but if you are not currently working with data in an open format then you increase the cost of archiving and risk losing information up front by extracting it from a proprietary format, and balancing those concerns is more political than technical.
>
> Note that there exist open binary formats, and the goals of your archiving task and nature of the data would have to be considered in deciding which of the many to use. My own experience has been that plain text survives time best, but YMMV.
>

Well, I didn't mean to troll the list. We have a small section on R, and 
in response to a question that we got from a user, we thought it would 
be a good idea to check with some actual R-users.

I think the responses are pretty much in line with what we expected. 
There's unsurprisingly no simple solution. A text format is advantageous 
due to the many options that a user has to work with text data. Your 
point is valid, with regards to the format of the source-data, which can 
be a clear constraint (other constraints are, for example, of a legal 
nature). I'm not trying to advocate for open formats per se, just trying 
to gather information so as to be able to make a recommendation.

I think we need to restructure the information on our web platform to 
clearly differentiate between data and the source code, scripts etc. 
which are used to process the data ("algorithms").

There is a big problem with data that has been archived but nobody knows 
what it is/ was for. Archivation, sharing, reproducibility are important 
subjects and we are interested in the experience of statisticians in 
dealing with these problems.

Thanks for the replies!
Joe

-- 
B 1003
Kommunikations-, Informations-, Medienzentrum (KIM)
Universitaet Konstanz

t: ++49-7531-883234
e: joe.gain at uni-konstanz.de



More information about the R-help mailing list