[R] Optimize code to read text-file with digits

Martin Maechler maechler at stat.math.ethz.ch
Fri Sep 8 17:09:00 CEST 2017


>>>>> peter dalgaard <pdalgd at gmail.com>
>>>>>     on Fri, 8 Sep 2017 16:12:21 +0200 writes:

    >> On 8 Sep 2017, at 15:51 , Martin Møller Skarbiniks
    >> Pedersen <traxplayer at gmail.com> wrote:
    >> 
    >> On 8 September 2017 at 14:37, peter dalgaard
    >> <pdalgd at gmail.com> wrote:
    >>> 
    >>> 
    >>>> On 8 Sep 2017, at 14:03 , peter dalgaard
    >>>> <pdalgd at gmail.com> wrote:
    >>>> 
    >>>> x <- scan("~/Downloads/digits.txt") x <-
    >>>> x[-seq(1,220000,11)]
    >>> 
    >>> ...and, come to think of it, if you really want the
    >>> 1000000 random digits:
    >>> 
    >>> xx <- c(outer(x,10^(0:4), "%/%")) %% 10
    >>> 
    >> 
    >> Hi Peter, Thanks a lot for the answers. I can see that I
    >> need to read about outer().  However I get a different
    >> result than expected.
    >> 
    R> x <- scan("digits.txt")
    >> Read 220000 items
    >> 
    R> head(x)
    >> [1] 0 10097 32533 76520 13586 34673
    >> 
    R> x <- x[-seq(1,220000,11)] head(x)
    >> [1] 10097 32533 76520 13586 34673 54876
    >> 
    R> head(c(outer(x,10^(0:4), "%/%")) %% 10, 10) #
    >> [1] 7 3 0 6 3 6 9 7 2 5
    >> 

    > Ah, right. You do get all the digits, but in the order of
    > the last digit of each 5 digit number, then all the
    > penultimate digits, etc. To get digits in the right order,
    > try

    >> xx <- c(t(outer(x,10^(4:0), "%/%"))) %% 10 head(xx, 100)
    >   [1] 1 0 0 9 7 3 2 5 3 3 7 6 5 2 0 1 3 5 8 6 3 4 6 7 3 5
    > 4 8 7 6 8 0 9 5 [35] 9 0 9 1 1 7 3 9 2 9 2 7 4 9 4 5 3 7 5
    > 4 2 0 4 8 0 5 6 4 8 9 4 7 4 2 [69] 9 6 2 4 8 0 5 2 4 0 3 7
    > 2 0 6 3 6 1 0 4 0 2 0 0 8 2 2 9 1 6 6 5

    > I.e., reverse the order of digit generation and transpose
    > the matrix that outer() creates (because matrices are
    > column-major).

As people are  "exercising" with R and it's Friday:

Try to use  read.fwf() instead of scan() to get to the digits directly,
and see if you get the identical digits, and if it is faster overall or not
[I have no idea of the answer to that].

another Martin.



More information about the R-help mailing list