[R] loop for a large database

Petr Savicky savicky at cs.cas.cz
Sun Feb 26 21:15:36 CET 2012


On Sun, Feb 26, 2012 at 04:13:49AM -0800, mari681 wrote:
> Yes, I am a newbie.
> 
> I have a data.frame (MyTable) of  1445846  rows and  15  columns with
> character data.
> And a character vector (MyVector) of 473491 elements.
> 
> I want simply to get a data.frame with the count of how many times each
> element of MyVector appears in MyTable.
> 
> I've tried a loop with : for (i in 1 : length (myvector))  sum (MyTable== i)
> 
> but it crashes my computer.

Hi.

Try first the following.

  out <- unclass(table(factor(MyTable[[1]], levels=myvector)))

The output should be a table of frequencies of the components
of "myvector" in the first column of "MyTable".

If this works for the data of the size, which you have,
then there are different possible ways how to compute
the frequencies in all columns. For example, concatenate
all columns to a single vector and apply the above to
this concatenation as follows.

  x <- c(as.matrix(MyTable))
  out <- unclass(table(factor(x, levels=myvector))) 

Here, "out" is a vector of the same length as "myvector"
and out[i] is the frequency of myvector[i] in "MyTable".

Hope this helps.

Petr Savicky.



More information about the R-help mailing list