[Rd] Suggestion for comments in data files (i.e. read.table)

Jens Oehlschlägel-Akiyoshi jens.oehlschlaegel-akiyoshi@mdfactory.de
Tue, 23 May 2000 10:40:25 +0200

Prof Brian D Ripley wrote:

> I am sure it is not `a very small change', I'm afraid.
> Basically, input is
> not done in a line-oriented way. read.table uses scan and
> count.fields.
> Both are internal functions that work at a char-by-char
> level.  You will
> need to add logic to skip lines, and for your rule that means
> adding logic
> to know that # is first in a line.

I agree that the basic problem is that R hasn't a linewise data import
utility. I guess, the reason for this is that anything working linewise has
to be written entirely in C for performance reasons, which means loosing the
flexibility of the R language.

However, there might be a way to solve this: processing batches of lines
instead of single lines. If we import m of n (m << n) lines at a time as a
string vector, we could use R vector functions to preprocess these strings
and then scan those. Thus scan() needed an extension to allow taking it's
input from string vectors. Or, perhaps better, we need seperate access to
the two functionalities of scan, (1) physical reading and (2) parsing.

Then read.table could be rewritten to work on batches of lines, with a
parameter nbatch=1000, and an optional parameter preprocess.func=NULL
which - if used - would return a preprocessed vector of strings, e.g.

  ThroughAwayCommentLines <- function(s)s[-grep("^#", s)]

to realize Telfors Tendys suggestion, or

  RemoveTrailingComments <- function(s){
    s <- gsub("#.*", "", s)
    s[nchar(s) > 0]

for Prof. Ripley's suggestion

> My suggestion would be to allow # anywhere on the line to skip the
> rest of the line, and to make sure that # inside quotes did nothing.

Final comment: any solution having # skip the rest of the line MUST be
optional, otherwise R looses it's ability to
import general ASCII-data. You never know whether some people use special
characters in their strings.

Any comments welcome


r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch