[R] iterators : checkFunc with ireadLines

William Michels wjm1 @end|ng |rom c@@@co|umb|@@edu
Thu May 28 00:00:00 CEST 2020


Hi Laurent,

Off the bat I would have guessed that the problem you're seeing has to
do with 'command line quoting' differences between the Windows system
and the Linux/Mac systems. I've noticed people using Windows having
better command line success with "exterior double-quotes / interior
single-quotes" while Linux/Mac tend to have more success with
"exterior single- quotes / interior double-quotes". The problem is
exacerbated in R by system() or pipe() calls which require another
(exterior) set of quotations.

1. You can print out your connection object to make sure that the
interior code was read properly into R. Also, take a look at the
'connections' help page to see if there are other parameters you need
to explicitly set (like encoding). Here's the first (working) example
from my last post to you:

> ?connections
> con_obj1
                                                              description
"raku -e '.put for lines.grep( / ^^N053 | ^^N163 /, :p );'  Laurents.txt"
                                                                    class
                                                                   "pipe"
                                                                     mode
                                                                     "rt"
                                                                     text
                                                                   "text"
                                                                   opened
                                                                 "opened"
                                                                 can read
                                                                    "yes"
                                                                can write
                                                                     "no"
>

2. You can try 'backslash-escaping' interior quotes in your system()
or pipe() calls. Also, in two of my previous examples I use paste() to
break up complicated quoting into more manageable chunks. You can try
these calls with 'backslash-escaped' interior quotes, and without
paste():

> con_obj1 <- pipe("raku -e \'.put for lines.grep( / ^^N053 | ^^N163 /, :p );\' Laurents.txt", open="rt");
> con_obj1
                                                             description
"raku -e '.put for lines.grep( / ^^N053 | ^^N163 /, :p );' Laurents.txt"
                                                                   class
                                                                  "pipe"
                                                                    mode
                                                                    "rt"
                                                                    text
                                                                  "text"
                                                                  opened
                                                                "opened"
                                                                can read
                                                                   "yes"
                                                               can write
                                                                    "no"
>

3. If R creates your 'con_obj' without throwing an error, then you
should try the most basic functions for reading data into R, something
like readLines(). Again, recreate our 'con_obj' with different
encodings, if necessary. Be careful of reading from the same
connection object with multiple R functions (an unlikely scenario, but
one that should be mentioned). Below it appears that 'con_obj1' gets
consumed by readLines() before the second call to scan():

> rm(con_obj1)
> # note: dropped ':p' adverb below to simplify
> con_obj1 <- pipe("raku -e \'.put for lines.grep( / ^^N053 | ^^N163 / );\' Laurents.txt", open="rt");
> scan(con_obj1)
Error in scan(con_obj1) : scan() expected 'a real', got 'N053'
> con_obj1 <- pipe("raku -e \'.put for lines.grep( / ^^N053 | ^^N163 / );\' Laurents.txt", open="rt");
> readLines(con_obj1)
[1] "N053    -0.014083    -0.004741    0.001443    -0.010152 -0.012996
   -0.005337    -0.008738    -0.015094    -0.012104"
[2] "N163    -0.054023    -0.049345    -0.037158    -0.04112 -0.044612
   -0.036953    -0.036061    -0.044516    -0.046436"
> scan(con_obj1)
Read 0 items
numeric(0)

>

Other than that, you can post here again and we'll try to help. If you
become convinced it's a raku problem, you can check the 'raku-grep'
help page at https://docs.raku.org/routine/grep, or post a question to
the perl6-users mailing list at perl6-users using perl.org .

HTH, Bill.

W. Michels, Ph.D.
On Wed, May 27, 2020 at 1:56 AM Laurent Rhelp <LaurentRHelp using free.fr> wrote:
>
> I installed raku on my PC to test your solution:
>
> The command raku -e '.put for lines.grep( / ^^N053 | ^^N163 /, :p );'
> Laurents.txt works fine when I write it in the bash command but when I
> use the pipe command in R as you say there is nothing in lines with
> lines <- read.table(i)
>
> There is the same problem with Ivan's solution the command grep -E
> '^(N053|N163)' test.txt works fine under the bash command but not i <-
> pipe("grep -E '^(N053|N163)' test.txt"); lines <- read.table(i)
>
> May be it is because I work with MS windows ?
>
> thx
> LP
>
>
>
>
> Le 24/05/2020 à 04:34, William Michels a écrit :
> > Hi Laurent,
> >
> > Seeking to give you an "R-only" solution, I thought the read.fwf()
> > function might be useful (to read-in your first column of data, only).
> > However Jeff is correct that this is a poor strategy, since read.fwf()
> > reads the entire file into R (documented in "Fixed-width-format
> > files", Section 2.2: R Data Import/Export Manual).
> >
> > Jeff has suggested a number of packages, as well as using a database.
> > Ivan Krylov has posted answers using grep, awk and perl (perl5--to
> > disambiguate). [In point of fact, the R Data Import/Export Manual
> > suggests using perl]. Similar to Ivan, I've posted code below using
> > the Raku programming language (the language formerly known as Perl6).
> > Regexes are claimed to be more readable, but are currently very slow
> > in Raku. However on the plus side, the language is designed to handle
> > Unicode gracefully:
> >
> >> # pipe() using raku-grep on Laurent's data (sep=mult whitespace):
> >> con_obj1 <- pipe(paste("raku -e '.put for lines.grep( / ^^N053 | ^^N163 /, :p );' ", "Laurents.txt"), open="rt");
> >> p6_import_a <- scan(file=con_obj1, what=list("","","","","","","","","",""), flush=TRUE, multi.line=FALSE, quiet=TRUE);
> >> close(con_obj1);
> >> as.data.frame(sapply(p6_import_a, t), stringsAsFactors=FALSE);
> >    V1   V2        V3        V4        V5        V6        V7        V8
> >        V9       V10
> > 1  2 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
> > -0.008738 -0.015094
> > 2  4 N163 -0.054023 -0.049345 -0.037158  -0.04112 -0.044612 -0.036953
> > -0.036061 -0.044516
> >> # pipe() using raku-grep "starts-with" to find genbankID ( >3GB TSV file)
> >> # "lines[0..5]" restricts raku to reading first 6 lines!
> >> # change "lines[0..5]" to "lines" to run raku code on whole file:
> >> con_obj2 <- pipe(paste("raku -e '.put for lines[0..5].grep( *.starts-with(q[A00145]), :p);' ", "genbankIDs_3GB.tsv"), "rt");
> >> p6_import_b <- read.table(con_obj2, sep="\t");
> >> close(con_obj2)
> >> p6_import_b
> >    V1     V2       V3          V4 V5
> > 1  4 A00145 A00145.1 IFN-alpha A NA
> >> # unicode test using R's system() function:
> >> try(system("raku -ne '.grep( /  你好  |  こんにちは  |  مرحبا  |  Привет  /, :v ).put;'  hello_7lang.txt", intern = TRUE, ignore.stderr = FALSE))
> > [1] ""                    ""                    ""
> > "你好 Chinese"
> > [5] "こんにちは Japanese" "مرحبا Arabic"        "Привет Russian"
> > [special thanks to Brad Gilbert, Joseph Brenner and others on the
> > perl6-users mailing list. All errors above are my own.]
> >
> > HTH, Bill.
> >
> > W. Michels, Ph.D.
> >
> >
> >
> >
> > On Fri, May 22, 2020 at 4:48 AM Laurent Rhelp <LaurentRHelp using free.fr> wrote:
> >> Hi Ivan,
> >>     Endeed, it is a good idea. I am under MSwindows but I can use the
> >> bash command I use with git. I will see how to do that with the unix
> >> command lines.
> >>
> >>
> >> Le 20/05/2020 à 09:46, Ivan Krylov a écrit :
> >>> Hi Laurent,
> >>>
> >>> I am not saying this will work every time and I do recognise that this
> >>> is very different from a more general solution that you had envisioned,
> >>> but if you are on an UNIX-like system or have the relevant utilities
> >>> installed and on the %PATH% on Windows, you can filter the input file
> >>> line-by-line using a pipe and an external program:
> >>>
> >>> On Sun, 17 May 2020 15:52:30 +0200
> >>> Laurent Rhelp <LaurentRHelp using free.fr> wrote:
> >>>
> >>>> # sensors to keep
> >>>> sensors <-  c("N053", "N163")
> >>> # filter on the beginning of the line
> >>> i <- pipe("grep -E '^(N053|N163)' test.txt")
> >>> # or:
> >>> # filter on the beginning of the given column
> >>> # (use $2 for the second column, etc.)
> >>> i <- pipe("awk '($1 ~ \"^(N053|N163)\")' test.txt")
> >>> # or:
> >>> # since your message is full of Unicode non-breaking spaces, I have to
> >>> # bring in heavier machinery to handle those correctly;
> >>> # only this solution manages to match full column values
> >>> # (here you can also use $F[1] for second column and so on)
> >>> i <- pipe("perl -CSD -F'\\s+' -lE \\
> >>>    'print join qq{\\t}, @F if $F[0] =~ /^(N053|N163)$/' \\
> >>>    test.txt
> >>> ")
> >>> lines <- read.table(i) # closes i when done
> >>>
> >>> The downside of this approach is having to shell-escape the command
> >>> lines, which can become complicated, and choosing between use of regular
> >>> expressions and more wordy programs (Unicode whitespace in the input
> >>> doesn't help, either).
> >>>
> >>
> >> --
> >> L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
> >> https://www.avast.com/antivirus
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
> https://www.avast.com/antivirus
>



More information about the R-help mailing list