An idea for something better than read.table

A.J. Rossini
26 Feb 1999 06:15:17 -0800

>>>>> "MP" == Martyn Plummer <> writes:

    MP> If you want a file format for R datasets which describe their
    MP> own metadata, it might be worth thinking about defining it in
    MP> XML.


    MP> This probably goes a long way beyong Peter's suggestion, but
    MP> presumably it could be done in the same way, by adding a
    MP> header to the top of a data file. It does leave open the
    MP> possibility that other statistical packages will be able to
    MP> read the data and extract any meta data (storage type,
    MP> variable labels, value labels, ...) they want to use.

The only caveat is that your data set looks like an HTML(SGML) file.
However, there are parsers for nearly any language (well, Java, C++,
C, PERL, and Python come to mind), which are nicely licensed.

Writing out the XML DTD for statistical datasets (and for
communicating the need for procedures of a certain type on those
datasets) has been simmering in the back of my mind for a while, now.
I'd be interested in hearing from anyone else who has been playing
with XML for passing data (my application is for inter-stat package
communication, but might not be bad for with-in package communication,
but it needs a bit more exploration).

It would also lead to WWW-friendly processing :-).


A.J. Rossini
UW Biostatistics & Center for AIDS Research 
206-543-1044 / 206-720-4282              
r-devel mailing list -- Read
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: