An idea for something better than read.table

Peter Dalgaard BSA p.dalgaard@biostat.ku.dk
11 Feb 1999 18:46:54 +0100


I was recently converting some datasets for use in an R package and it
occurred to me that there really is no "neat" way to input a data
frame if it is to contain factor variables. 

One can use dput()/source or dump() after massaging data into the
right format, of course, but there isn't really anything which allows
you to store the input instructions with the data beyond the simple
header=T type format. 

So I thought of ways to enhance the header. The best idea I've been
able to come up with this far is to 

(a) Write a function - basically an extension of scan() - which allows
    you to specify the column data type in more detail. Let's call it
    data.file() for now. It would pretty much have to deparse all of
    its arguments and interpret things in slightly unusual ways, but R
    can do that, and some of functions (notably help() and data())
    already play this kind of game with the parser...

(b) Have a function, say read(), which parses the 1st expression in a
    file and executes it *with the remainder of the file as the
    argument*. (Currently, this is impossible, but it would be if
    one just kept track of the line number while parsing. parse()
    could stick it on as an attribute of the parsed expression list if
    asked to do so.)

This would make a file format something like the following possible.

[There's another loose idea in there involving a control item to handle
separators, na.strings, etc. - the intention being that read() plugs
in the file= and skip= arguments for the actual call.]

Would this be an approach worth pursuing?

--- Top of file ---
data.file(control(sep="w",na="."),
        Item = factor(levels=1:4,labels=c("A","B","C","D")),
        Size = numeric(),
        Year = factor(levels=1980:1985)
)
1       0     1980    
1       10    1981    
1       14    1982    
1       20    1983    
1       25    1984    
1       30    1985    
2       0     1980    
2       5     1981    
2       6     1982    
2       8     1984    
3       0     1984    
3       2     1985    
4       0     1980    
4       20    1981    
4       30    1982    
4       30    1984    
4       35    1985    
--- End of file ---

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._