[R] Why are big data.frames slow? What can I do to get it faster?

Thomas Lumley tlumley at u.washington.edu
Sun Oct 6 23:21:41 CEST 2002


On Sun, 6 Oct 2002, Marcus Jellinghaus wrote:

> Hello,
>
> I´m quite new to this list.
> I have a high frequency-dataset with more than 500.000 records.
> I want to edit a data.frame "Test". My small programm runs fine with a small
> part of the dataset (just 100 records), but it is very slow with a huge
> dataset. Of course it get´s slower with more records, but when I change just
> the size of the frame and keep the number of edited records fixed, I see
> that it is also getting slower.
>
> Here is my program:
>
> print(dim(test)[1])
> Sys.time()
> for(i in 1:100) {
>   test[i,6] = paste(test[i,2],"-",test[i,3], sep = "")
> }
> Sys.time()

1.6.0 has faster dataframe indexing.  Also, there's no need to do this one
line at a time
  i<-1:100
  test[i,6]<-paste(test[i,2],test[i,3],sep="-")
should be quite a bit faster.

	-thomas


> I connect 2 currency symbols to a currency pair.
> I always calculate only for the first 100 lines.
> WHen I load just 100 lines in the data.frame "test", it takes 1 second.
> When I load 1000 lines, editing 100 lines takes 2 seconds,
> 10,000 lines loaded and 100 lines editing takes 5 seconds,
> 100,000 lines loaded and editing 100 lines takes 31 seconds,
> 500,000 lines loaded and editing 100 lines takes 11 minutes(!!!).
>
> My computer has 1 GB Ram, so that shouldn´t be the reason.
>
> Of course, I could work with many small data.frames instead of one big, but
> the program above is just the very first step and so I don´t want to split.
>
> Is there a way to edit big data.frames without waiting for a long time?
>
>
> Thank´s a lot for help,
>
>
> Marcus
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>

Thomas Lumley			Asst. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle
^^^^^^^^^^^^^^^^^^^^^^^^
- NOTE NEW EMAIL ADDRESS


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list