[R] difftimes; histogram; memory problems

Moshe Olshansky m_olshansky at yahoo.com
Tue Feb 16 04:53:57 CET 2010


Hi Jonathan,

If minDate = min(Condition1) - max(Condition2) and maxDate = max(Condition1) - min(Condition2) then all your differences would be between minDay and maxDay, and hopefully this is not a very big range (unless you are going many thousands years into the past or the future). So basically for any number of days in this range you should count the number of times it appears. To speed up the calculations you may do this with just one loop (and one vectorized operation) - I can not do this without a single loop (if we want to limit the memory use). 
Let me know if you need the actual code.

Regards,
Moshe.

--- On Tue, 16/2/10, Jonathan <jonsleepy at gmail.com> wrote:

> From: Jonathan <jonsleepy at gmail.com>
> Subject: Re: [R] difftimes; histogram; memory problems
> To: "r-help" <r-help at r-project.org>
> Received: Tuesday, 16 February, 2010, 1:17 PM
> Let me fix a couple of typos in that
> email:
> 
> Hi All:
> 
> Let's say I have two dataframes (Condition1 and
> Condition2); each
> being on the order of 12,000 and 16,000 rows; 1
> column.  The entries
> contain dates.
> 
> I'd like to calculate, for each possible pair of dates
> (that is:
> Condition1[1:12,000] and Condition2[1:16,000], the number
> of days
> difference between the dates in the pair.  The result
> should be a
> matrix 12,000 by 16,000, which I'll call M.  The
> purpose of building
> such a matrix M is to create a histogram of all the values
> contained
> within it.
> 
> Ex):
> Condition1 <- data.frame('dates' =
> rep(c('2001-02-10','1998-03-14'),6000))
> Condition2 <- data.frame('dates' =
> rep(c('2003-07-06','2007-03-11'),8000))
> 
> First, my instinct is to try and vectorize the
> operation.  I tried
> this by expanding each vector into a matrix of repeated
> vectors (I'd
> then just subtract the two resultant matrices to get matrix
> M).  I got
> the following error:
> 
> > expandedCondition1 <- matrix(rep(Condition1[[1]],
> nrow(Condition2)), byrow=TRUE, ncol=nrow(Condition1))
> Error: cannot allocate vector of size 732.4 Mb
> > expandedCondition2 <- matrix(rep(Condition2[[1]],
> nrow(Condition1)), byrow=FALSE, nrow=nrow(Condition2))
> Error: cannot allocate vector of size 732.4 Mb
> 
> Since it seems these matrices are too large, I'm wondering
> whether
> there's a better way to call a hist command without
> actually building
> the said matrix..
> 
> I'd greatly appreciate any ideas!
> 
> Best,
> Jonathan
> 
> On Mon, Feb 15, 2010 at 8:19 PM, Jonathan <jonsleepy at gmail.com>
> wrote:
> > Hi All:
> >
> > Let's say I have two dataframes (Condition1 and
> Condition2); each
> > being on the order of 12,000 and 16,000 rows; 1
> column.  The entries
> > contain dates.
> >
> > I'd like to calculate, for each possible pair of dates
> (that is:
> > Condition1[1:10,000] and Condition2[1:10,000], the
> number of days
> > difference between the dates in the pair.  The result
> should be a
> > matrix 12,000 by 16,000.  Really, what I need is a
> histogram of all
> > the values in this matrix.
> >
> > Ex):
> > Condition1 <- data.frame('dates' =
> rep(c('2001-02-10','1998-03-14'),6000))
> > Condition2 <- data.frame('dates' =
> rep(c('2003-07-06','2007-03-11'),8000))
> >
> > First, my instinct is to try and vectorize the
> operation.  I tried
> > this by expanding each vector into a matrix of
> repeated vectors (I'd
> > then just subtract the two).  I got the following
> error:
> >
> >> expandedCondition1 <-
> matrix(rep(Condition1[[1]], nrow(Condition2)), byrow=TRUE,
> ncol=nrow(Condition1))
> > Error: cannot allocate vector of size 732.4 Mb
> >> expandedCondition2 <-
> matrix(rep(Condition2[[1]], nrow(Condition1)), byrow=FALSE,
> nrow=nrow(Condition2))
> > Error: cannot allocate vector of size 732.4 Mb
> >
> > Since it seems these matrices are too large, I'm
> wondering whether
> > there's a better way to call a hist command without
> actually building
> > the said matrix..
> >
> > I'd greatly appreciate any ideas!
> >
> > Best,
> > Jonathan
> >
> 
> ______________________________________________
> R-help at r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>



More information about the R-help mailing list