[R] identify time span in date vector

David Winsemius dwinsemius at comcast.net
Wed Apr 4 16:07:50 CEST 2012


On Apr 4, 2012, at 8:19 AM, Petr PIKAL wrote:

> Hi
>
>>
>> Dear Petr,
>>
>> thanks for taking your time.
>>
>> For this input, the first element should be selected since there are
> more
>> than 3 more dates within one year (basically, all other dates are  
>> within
>
>> one year) and at least one of them is more than 3 month later.
>>
>> In the meantime, I came up with some code (probably) doing what I  
>> want:
>>
>> identify_first_date = function(dates)
>> {
>> within_one_year = as.matrix(dist(dates)) < 366                  ###  
>> next
>
>> dates in same year?
>> within_one_year[upper.tri(within_one_year, diag=TRUE)]=FALSE
>>
>> within_one_month = as.matrix(dist(dates)) < 91                ###  
>> next
>> dates within 90 days?
>> within_one_month[upper.tri(within_one_month, diag=TRUE)]=FALSE
>>
>> dates[
>>   which(
>>   apply(within_one_year,2,sum) > apply(within_one_month,2,sum) &
>> ### more dates in one year than in one month
>>   apply(within_one_year,2,sum) >=3                   ### more than 4
>> dates in one year
>>   )[1]]
>> }
>>
>> I guess, the code could be improved, though, it takes some time.
>
> Your first condition can be fulfilled by
>
> c(as.numeric(diff(dates))<365, F) > c(as.numeric(diff(dates))<91,F))
>
> so if you put in your function
>
> identify_first_date2 = function(dates)
> {
> within_one_year = as.matrix(dist(dates)) < 366
> within_one_year[upper.tri(within_one_year, diag=TRUE)]=FALSE
>
> distance<-as.numeric(diff(dates))
>
> dates[ which( c(distance<365, F) > c(distance<91,F) &
> apply(within_one_year,2,sum) >=3)[1]]
> }
>
> You shall get some improvement, however I am still struggling to  
> evaluate
> how many consecutive dates are within one year.
>

I added a couple of dates to the test case on which my original  
erroneous sugegstion failed:

  dput(dates)
structure(c(11323, 11325, 11334, 11335, 11432, 11688, 12418), class =  
"Date")

This returns a list of "intervals" or perhaps "stretches" (?) spanning  
less than 365 days to assemble candidates for the first criterion:

intervals1 <- lapply(1:(length(dates)-4) , function(x)   
dates[which(dates - dates[x] < 365 & dates - dates[x] >=0)] )
 > intervals1
[[1]]
[1] "2001-01-01" "2001-01-03" "2001-01-12" "2001-01-13" "2001-04-20"

[[2]]
[1] "2001-01-03" "2001-01-12" "2001-01-13" "2001-04-20" "2002-01-01"

[[3]]
[1] "2001-01-12" "2001-01-13" "2001-04-20" "2002-01-01"


This then test whether the second to last element (the "penultimate"  
one in correct use of that often misused term) is at least 90 days out:

 > sapply(intervals1, function(x) x[length(x)-1] - x[1] >= 90)
[1] FALSE  TRUE  TRUE
 > intervals1[which( sapply(intervals1, function(x) x[length(x)-1] -  
x[1] >90)) ]
[[1]]
[1] "2001-01-03" "2001-01-12" "2001-01-13" "2001-04-20" "2002-01-01"

[[2]]
[1] "2001-01-12" "2001-01-13" "2001-04-20" "2002-01-01"


And this returns the starting date from that result:

 > intervals1[which( sapply(intervals1, function(x) x[length(x)-1] -  
x[1] >90)) ][[1]][1]
[1] "2001-01-03"

I see that I should have added a test for length greater than 3 but  
that should not be difficult.

 > intervals1[which( sapply(intervals1,
        function(x) x[length(x)-1] - x[1] >90 & length(x) >3)) ][[1]][1]
[1] "2001-01-03"


-- 
David.

>
>
>
>>
>> Best,
>> Felix
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Petr PIKAL [mailto:petr.pikal at precheza.cz]
>> Gesendet: Mittwoch, 4. April 2012 09:47
>> An: Fischer, Felix
>> Cc: r-help at r-project.org
>> Betreff: Odp: [R] identify time span in date vector
>>
>> Hi
>>
>> Can you please be more specific? Based on this input, what do you  
>> want
> as a result?
>>
>>> set.seed(111)
>>> dates = as.Date(sort(rnorm(10,3000,100)), origin = "2000-1-1") dates
>> [1] "2007-08-01" "2007-10-21" "2007-12-08" "2007-12-15" "2008-01-29"
>> "2008-02-14" "2008-02-16" "2008-03-01"
>> [9] "2008-04-02" "2008-04-11"
>>>
>>
>> Regards
>> Petr
>>
>>>
>>> Hello everyone,
>>>
>>> i try to identify the first element of a date vector, for which the
>>> following condition holds: at least 3 more dates within the next 365
>> days,
>>> but at least one of these must be between 3-12 month later.
>>>
>>> dates = as.Date(sort(rnorm(10,3000,100)), origin = "2000-1-1")
>>>
>>> Has anyone an idea how to do this economically? I'll need to apply
>>> this
>> to
>>> a large dataset with date vectors of various lengths and I can think
>> only
>>> of quite difficult algorithms :(
>>>
>>> Any ideas would be appreciated,
>>> Felix
>>>
>>>
>>>   [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list