[R] difftimes; histogram; memory problems

Jonathan jonsleepy at gmail.com
Tue Feb 16 02:19:20 CET 2010


Hi All:

Let's say I have two dataframes (Condition1 and Condition2); each
being on the order of 12,000 and 16,000 rows; 1 column.  The entries
contain dates.

I'd like to calculate, for each possible pair of dates (that is:
Condition1[1:10,000] and Condition2[1:10,000], the number of days
difference between the dates in the pair.  The result should be a
matrix 12,000 by 16,000.  Really, what I need is a histogram of all
the values in this matrix.

Ex):
Condition1 <- data.frame('dates' = rep(c('2001-02-10','1998-03-14'),6000))
Condition2 <- data.frame('dates' = rep(c('2003-07-06','2007-03-11'),8000))

First, my instinct is to try and vectorize the operation.  I tried
this by expanding each vector into a matrix of repeated vectors (I'd
then just subtract the two).  I got the following error:

> expandedCondition1 <- matrix(rep(Condition1[[1]], nrow(Condition2)), byrow=TRUE, ncol=nrow(Condition1))
Error: cannot allocate vector of size 732.4 Mb
> expandedCondition2 <- matrix(rep(Condition2[[1]], nrow(Condition1)), byrow=FALSE, nrow=nrow(Condition2))
Error: cannot allocate vector of size 732.4 Mb

Since it seems these matrices are too large, I'm wondering whether
there's a better way to call a hist command without actually building
the said matrix..

I'd greatly appreciate any ideas!

Best,
Jonathan



More information about the R-help mailing list