[R] difftimes; histogram; memory problems

Gabor Grothendieck ggrothendieck at gmail.com
Tue Feb 16 04:54:41 CET 2010


Just one further point. If you do run out of memory using #2 then try
this which is the same as #2 but adds a dbname argument to force the
computation to be done from disk rather than memory.

sqldf("select d1.x - d2.x, count(*) from d1, d2 group by d1.x - d2.x",
dbname = tempfile())

On Mon, Feb 15, 2010 at 10:45 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> Here are two approaches to try:
>
>> # test data
>> d1 <- data.frame(x = Sys.Date() + 1:3)
>> d2 <- data.frame(x = Sys.Date() - 1:3)
>
>> # 1. you might not have enough  memory for this but its short
>> table(outer(1:3, -(1:3), "-"))
>
> 2 3 4 5 6
> 1 2 3 2 1
>
>> # 2. this one performs all the operations outside of R getting
>> #    result back in so it won't need as much memory
>>
>> library(sqldf)
>> sqldf("select d1.x - d2.x, count(*) from d1, d2 group by d1.x - d2.x")
>  d1.x - d2.x count(*)
> 1           2        1
> 2           3        2
> 3           4        3
> 4           5        2
> 5           6        1
>
>
> On Mon, Feb 15, 2010 at 9:17 PM, Jonathan <jonsleepy at gmail.com> wrote:
>> Let me fix a couple of typos in that email:
>>
>> Hi All:
>>
>> Let's say I have two dataframes (Condition1 and Condition2); each
>> being on the order of 12,000 and 16,000 rows; 1 column.  The entries
>> contain dates.
>>
>> I'd like to calculate, for each possible pair of dates (that is:
>> Condition1[1:12,000] and Condition2[1:16,000], the number of days
>> difference between the dates in the pair.  The result should be a
>> matrix 12,000 by 16,000, which I'll call M.  The purpose of building
>> such a matrix M is to create a histogram of all the values contained
>> within it.
>>
>> Ex):
>> Condition1 <- data.frame('dates' = rep(c('2001-02-10','1998-03-14'),6000))
>> Condition2 <- data.frame('dates' = rep(c('2003-07-06','2007-03-11'),8000))
>>
>> First, my instinct is to try and vectorize the operation.  I tried
>> this by expanding each vector into a matrix of repeated vectors (I'd
>> then just subtract the two resultant matrices to get matrix M).  I got
>> the following error:
>>
>>> expandedCondition1 <- matrix(rep(Condition1[[1]], nrow(Condition2)), byrow=TRUE, ncol=nrow(Condition1))
>> Error: cannot allocate vector of size 732.4 Mb
>>> expandedCondition2 <- matrix(rep(Condition2[[1]], nrow(Condition1)), byrow=FALSE, nrow=nrow(Condition2))
>> Error: cannot allocate vector of size 732.4 Mb
>>
>> Since it seems these matrices are too large, I'm wondering whether
>> there's a better way to call a hist command without actually building
>> the said matrix..
>>
>> I'd greatly appreciate any ideas!
>>
>> Best,
>> Jonathan
>>
>> On Mon, Feb 15, 2010 at 8:19 PM, Jonathan <jonsleepy at gmail.com> wrote:
>>> Hi All:
>>>
>>> Let's say I have two dataframes (Condition1 and Condition2); each
>>> being on the order of 12,000 and 16,000 rows; 1 column.  The entries
>>> contain dates.
>>>
>>> I'd like to calculate, for each possible pair of dates (that is:
>>> Condition1[1:10,000] and Condition2[1:10,000], the number of days
>>> difference between the dates in the pair.  The result should be a
>>> matrix 12,000 by 16,000.  Really, what I need is a histogram of all
>>> the values in this matrix.
>>>
>>> Ex):
>>> Condition1 <- data.frame('dates' = rep(c('2001-02-10','1998-03-14'),6000))
>>> Condition2 <- data.frame('dates' = rep(c('2003-07-06','2007-03-11'),8000))
>>>
>>> First, my instinct is to try and vectorize the operation.  I tried
>>> this by expanding each vector into a matrix of repeated vectors (I'd
>>> then just subtract the two).  I got the following error:
>>>
>>>> expandedCondition1 <- matrix(rep(Condition1[[1]], nrow(Condition2)), byrow=TRUE, ncol=nrow(Condition1))
>>> Error: cannot allocate vector of size 732.4 Mb
>>>> expandedCondition2 <- matrix(rep(Condition2[[1]], nrow(Condition1)), byrow=FALSE, nrow=nrow(Condition2))
>>> Error: cannot allocate vector of size 732.4 Mb
>>>
>>> Since it seems these matrices are too large, I'm wondering whether
>>> there's a better way to call a hist command without actually building
>>> the said matrix..
>>>
>>> I'd greatly appreciate any ideas!
>>>
>>> Best,
>>> Jonathan
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list