[R] Optimize code to read text-file with digits

William Dunlap wdunlap at tibco.com
Fri Sep 8 17:28:15 CEST 2017


Remove the for loop and all the [i]'s in your code and it will probably go
faster.  I.e., change

f0 <- function (lines)
{
    numbers <- vector("numeric")
    for (i in 1:length(lines)) {
        lines[i] <- sub("[^ ]+ +", "", lines[i])
        lines[i] <- gsub(" ", "", lines[i])
        numbers <- c(numbers, as.numeric(unlist(strsplit(lines[i],
            ""))))
    }
    numbers
}

to

f1 <- function (lines)
{
    lines <- sub("[^ ]+ +", "", lines)
    lines <- gsub(" ", "", lines)
    as.numeric(unlist(strsplit(lines, "")))
}

I haven't measured it, but the big time sink may come from f0 growing the
'numbers' vector bit by bit.  That can cause a lot of reallocations and
garbage collections.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Sep 8, 2017 at 1:48 AM, Martin Møller Skarbiniks Pedersen <
traxplayer at gmail.com> wrote:

> Hi,
>
>   Every day I try to write some small R programs to improve my R-skills.
>   Yesterday I wrote a small program to read the digits from "A Million
> Random Digits" from RAND.
>   My code works but it is very slow and I guess the code is not optimal.
>
> The digits.txt file downloaded from
> https://www.rand.org/pubs/monograph_reports/MR1418.html
> contains 20000 lines which looks like this:
> 00000   10097 32533  76520 13586  34673 54876  80959 09117  39292 74945
> 00001   37542 04805  64894 74296  24805 24037  20636 10402  00822 91665
> 00002   08422 68953  19645 09303  23209 02560  15953 34764  35080 33606
> 00003   99019 02529  09376 70715  38311 31165  88676 74397  04436 27659
> 00004   12807 99970  80157 36147  64032 36653  98951 16877  12171 76833
>
> My program which is slow looks like this:
>
> filename <- "digits.txt"
> lines <- readLines(filename)
>
> numbers <- vector('numeric')
> for (i in 1:length(lines)) {
>
>     # remove first column
>     lines[i] <- sub("[^ ]+ +","",lines[i])
>
>     # remove spaces
>     lines[i] <- gsub(" ","",lines[i])
>
>     # split the characters and convert them into numbers
>     numbers <- c(numbers,as.numeric(unlist(strsplit(lines[i],""))))
> }
>
> Thanks for any advice how this program can be improved.
>
> Regards
> Martin M. S. Pedersen
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list