[R] Why are big data.frames slow? What can I do to get it faster?

Marcus Jellinghaus Marcus_Jellinghaus at gmx.de
Tue Oct 8 10:11:16 CEST 2002


I wanted to know why not-vectorized operations are slow.
Thank you for your suggestions.
I did three things:
-Beside looking at the total computation time, I analyzed the
GarbageCollection-time (gc()).
-I told R to use more memory. I use version 1.6.0 and used the command
"Rgui --min-vsize=600M --min-nsize=10M"
-I used test$Fieldname[i] instead of test[i, 6].

My results show that it saves a lot of time when I use enough memory and the
fieldnames. So thank´s a lot!

Here are the details:
Without fieldnames and without use of more memory:
GC-Time: 494Seconds, other calculations 124Seconds, Total 619Seconds.

Without fieldnames, with "Rgui --min-vsize=600M --min-nsize=10M"
GC-Time: 34Seconds, other calculations 114Seconds, Total 148Seconds.

With fieldnames, without use of more memory:
GC-Time: 0,5 Seconds, other calculations 2 Seconds, Total 2,5 Seconds.
(but long time for loading the matrix)

with fieldnames, with "Rgui --min-vsize=600M --min-nsize=10M"
GC-Time: < 1 Second, other calculations < 1 Second, Total < 1 second

Marcus Jellinghaus



Peter Dalgaard writes:

>You'll likely have to invoke the garbage collector a couple of times,
>and there might also be issues of memory growth kicking in. Once you
>get beyond some threshold, the machine starts swapping bits and pieces
>of the workspace in and out of physical memory,


Andy Liaw writes:

>If you are on Windows and using R version prior to 1.6.0, make sure R can
>use all 1GB of the ram, as the default is to use up to 256MB or physical
>RAM, which ever is smaller.  In R-1.6.0, that limit is raised to the
smaller
>of 1GB and physical RAM.
[..]
>Extracting from data frame one element at a time the way you did is
>expensive.  I.e., test[i, 6] is slower than test$whatever[i].


Peter Dalgaard writes:

> It's somewhat difficult to reproduce the behaviour, since you only give
> part of the code necessary (e.g. how many *columns* do you have in
> your data frame?)

> summary(test)
    datetime                       CCY1               CCY2
Bid               Ask             CCYPair
 Min.   :2002-05-28 00:00:02   Length:500000      Length:500000      Min.
:  0.557   Min.   :  0.5574   Length:500000
 1st Qu.:2002-05-28 17:30:47   Mode  :character   Mode  :character   1st
Qu.:  1.532   1st Qu.:  1.5319   Mode  :character
 Median :2002-05-29 14:43:02                                         Median
:  4.047   Median :  4.0476
 Mean   :2002-05-29 14:42:36                                         Mean
: 38.664   Mean   : 38.6858
 3rd Qu.:2002-05-30 10:22:30                                         3rd
Qu.: 32.888   3rd Qu.: 32.8891
 Max.   :2002-05-31 02:58:54                                         Max.
:182.150   Max.   :182.3000

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list