[Rd] feature request: comment character in read.table?

Ben Bolker bolker@zoo.ufl.edu
Thu, 13 Sep 2001 19:04:49 -0400 (EDT)


On Thu, 13 Sep 2001, Peter Kleiweg wrote:

  [snip]
>
> That is not very robust. What about these:
>
1>     # a comment
2>     1 2 3  # a comment
3>            # a comment
4>     "1" "2" "3  # not a comment"
5>     "# not a comment"  # a comment
>
> Comments don't have to start at the first column, and comments
> can also exist after real data. A comment char within a string
> should not be taken as the start of a comment, and you also have
> to take into account that the tokens delimiting a string can
> vary.
>

  As Brian Ripley has pointed out, he hopes to do this at a lower level,
more robustly, later.  In the meantime, in my defense: this code works for
lines 1, 2, and 3 (it's OK with comments that start after the first column
and that exist after real data -- that was part of my spec).  It doesn't
deal with comment characters embedded in quoted strings, but I don't have
any problem with telling people that they're not allowed to have comment
characters in quoted strings in their data -- it seems to be a perfectly
reasonable restriction.
  If I wanted to hack this further I would probably try to do a strsplit
on quotation characters, and look for comment characters only in the odd
parts of the split.  And if someone puts

  "\"\\"\\\" ## "  "#" "\\  \" \#"

in their data file, then they deserve what they get ... :-)

  Ben Bolker


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._