[R] loop for a large database

Petr Savicky savicky at cs.cas.cz
Sun Feb 26 20:31:09 CET 2012


On Sun, Feb 26, 2012 at 04:13:49AM -0800, mari681 wrote:
> Yes, I am a newbie.
> 
> I have a data.frame (MyTable) of  1445846  rows and  15  columns with
> character data.
> And a character vector (MyVector) of 473491 elements.
> 
> I want simply to get a data.frame with the count of how many times each
> element of MyVector appears in MyTable.
> 
> I've tried a loop with : for (i in 1 : length (myvector))  sum (MyTable== i)
> 
> but it crashes my computer.

Hi.

As David pointed out, you probably want to compute 

  sum (MyTable== myvector[i])

and not sum (MyTable== i).

Also, i would expect storing the results somewhere, for example

  numOccur <- rep(NA, times=length(myvector))
  for (i in 1:length(myvector)) numOccur[i] <- sum(MyTable == myvector[i])

What do you see on the crashing computer? I would expect it to run for
a long time, but not crashing.

Try to run your code on a smaller part of the data to test efficiency
of different approaches.

How many different strings are in your data? If there is a lot of
repeated strings, then it may be better to first compute the
frequency table of them and search the strings from "myvector"
in this table and sum the frequencies.

Does your data frame consist of character vectors or from factors?
This may be seen by testing class(MyTable[[1]]).

Petr Savicky.



More information about the R-help mailing list