[Rd] More on scan: extra field at end of line

Yves Gauvreau cyg@sympatico.ca
Tue, 26 Dec 2000 11:08:38 -0500


Hi,

I see that Prof Ripley propose to pre-process the file using sed. I saw that
to do so he used "pipe". I look for it on my system (see below) and the
function doesn't seem to be available. Since I have sed from cygwin32 I
wonder if there would a way to use it in a similar fashion as proposed here?

Thanks

YG

platform Windows
arch     x86
os       Win32
system   x86, Win32
status
major    1
minor    1.1
year     2000
month    August
day      15
language R

> -----Message d'origine-----
> De : owner-r-devel@stat.math.ethz.ch
> [mailto:owner-r-devel@stat.math.ethz.ch]De la part de Prof Brian Ripley
> Envoye : Tuesday, December 26, 2000 9:54 AM
> A : Peter Kleiweg
> Cc : r-devel@stat.math.ethz.ch
> Objet : Re: [Rd] More on scan: extra field at end of line
>
>
> On Tue, 26 Dec 2000, Peter Kleiweg wrote:
>
> >
> > Suppose, I have a file "data1" containing:
> >
> >     450   390   467   654    30   542   334   432   421
> >     357   497   493   550   549   467   575   578   342
> >     446   547   534   495   979   479
> >
> > I can read this file with:
> >
> >     scan("data1")
> >     Read 24 items
> >      [1] 450 390 467 654  30 542 334 432 421 357 497 493 550
> 549 467 575 578 342 446
> >      [20] 547 534 495 979 479
> >
> > But now, suppose I have a file "data2" containing:
> >
> >     450, 390, 467, 654,  30, 542, 334, 432, 421,
> >     357, 497, 493, 550, 549, 467, 575, 578, 342,
> >     446, 547, 534, 495, 979, 479
> >
> > When I try to read this with sep="," I get:
> >
> >     scan("data2", sep=",")
> >     Read 26 items
> >      [1] 450 390 467 654  30 542 334 432 421  NA 357 497 493
> 550 549 467 575 578 342
> >      [20]  NA 446 547 534 495 979 479
> >
> > I get two extra fields, both NA. Not what I'd want. And I can't
> > drop the NA's, because there could be other NA's, not resulting
> > from this comma-EOL combination.
>
> You can easily remove the trailing commas, though, as in
>
> scan(pipe("sed -e s/,$// data2"), sep=",")
> Read 24 items
>  [1] 450 390 467 654  30 542 334 432 421 357 497 493 550 549 467 575 578
> 342 446
> [20] 547 534 495 979 479
>
>
> > I suggest, the proper action for scan would be to treat the
> > combination sep plus newline as a single separator.
>
> However, that's not compatible with S or earlier versions of R or
> the documentation
>
>      sep: by default, scan expects to read white-space delimited input
>           fields.  Alternatively, `sep' can be used to specify a
>           character which delimits fields.  A field is always delimited
>           by a newline unless it is quoted.
>
> I suggest the proper action is to act as documented!
>
> --
> Brian D. Ripley,                  ripley@stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272860 (secr)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> -.-.-.-.-.-.-
> r-devel mailing list -- Read
> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> _._._._._._._
>

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._