[R] summing values by week - based on daily dates - but with some dates missing

Wed Apr 6 18:18:56 CEST 2011

Sorry - never mind. It turns out I did not load the zoo package. That
was the reason.

On Wed, Apr 6, 2011 at 12:14 PM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:
> Guys, sorry to bother you again:
>
> I am running everything as before (see code below - before the line
> with a lot of ######). But now I am getting an error:
> Error in eval(expr, envir, enclos) : could not find function "na.locf"
> I also noticed that after I run the 3rd line from the bottom: "wk <-
> as.numeric(format(myframe$dates, "%Y.%W"))" - there are some weeks
> that end with .00
> And then, after I run the 2nd line from the bottom: "is.na(wk) <- wk
> %% 1 == 0" those weeks turn into NAs.
> Whether I run the second line or not - I get the same error about it
> not finding the function "na.locf".
> Do you know what might be going on?
> Thanks a lot!
> Dimitri
>
> ### Creating a longer example data set:
> mydates<-rep(seq(as.Date("2008-12-29"), length = 500, by = "day"),2)
> myfactor<-c(rep("group.1",500),rep("group.2",500))
> set.seed(123)
> myvalues<-runif(1000,0,1)
> myframe<-data.frame(dates=mydates,group=myfactor,value=myvalues)
> (myframe)
> dim(myframe)
>
> ## Removing same rows (dates) unsystematically:
> set.seed(123)
> removed.group1<-sample(1:500,size=150,replace=F)
> set.seed(456)
> removed.group2<-sample(501:1000,size=150,replace=F)
> to.remove<-c(removed.group1,removed.group2);length(to.remove)
> to.remove<-to.remove[order(to.remove)]
> myframe<-myframe[-to.remove,]
> (myframe)
> dim(myframe)
> names(myframe)# write.csv(myframe,file="x.test.csv",row.names=F)
>
> wk <- as.numeric(format(myframe$dates, "%Y.%W"))
> is.na(wk) <- wk %% 1 == 0
> solution<-aggregate(value ~ group + na.locf(wk), myframe, FUN = sum)
>
>
>
>
> ###############################################################
>
> On Wed, Mar 30, 2011 at 5:25 PM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
>> You're right:
>>
>> wk <- as.numeric(format(myframe$dates, "%Y.%W"))
>> is.na(wk) <- wk %% 1 == 0
>> solution<-aggregate(value ~ group + na.locf(wk), myframe, FUN = sum)
>>
>>
>> On Wed, Mar 30, 2011 at 6:10 PM, Dimitri Liakhovitski
>> <dimitri.liakhovitski at gmail.com> wrote:
>>> Yes, zoo! That's what I forgot. It's great.
>>> Henrique, thanks a lot! One question:
>>>
>>> if the data are as I originally posted - then week numbered 52 is
>>> actually the very first week (it straddles 2008-2009).
>>> What if the data much longer (like in the code below - same as before,
>>> but more dates) so that we have more than 1 year to deal with.
>>> It looks like this code is lumping everything into 52 weeks. And my
>>> goal is to keep each week independent. If I have 2 years, then it
>>> should be 100+ weeks. Makes sense?
>>> Thank you!
>>>
>>> ### Creating a longer example data set:
>>> mydates<-rep(seq(as.Date("2008-12-29"), length = 500, by = "day"),2)
>>> myfactor<-c(rep("group.1",500),rep("group.2",500))
>>> set.seed(123)
>>> myvalues<-runif(1000,0,1)
>>> myframe<-data.frame(dates=mydates,group=myfactor,value=myvalues)
>>> (myframe)
>>> dim(myframe)
>>>
>>> ## Removing same rows (dates) unsystematically:
>>> set.seed(123)
>>> removed.group1<-sample(1:500,size=150,replace=F)
>>> set.seed(456)
>>> removed.group2<-sample(501:1000,size=150,replace=F)
>>> to.remove<-c(removed.group1,removed.group2);length(to.remove)
>>> to.remove<-to.remove[order(to.remove)]
>>> myframe<-myframe[-to.remove,]
>>> (myframe)
>>> dim(myframe)
>>> names(myframe)
>>>
>>> library(zoo)
>>> wk <- as.numeric(format(myframe$dates, '%W'))
>>> is.na(wk) <- wk == 0
>>> solution<-aggregate(value ~ group + na.locf(wk), myframe, FUN = sum)
>>> solution<-solution[order(solution$group),]
>>> write.csv(solution,file="test.csv",row.names=F)
>>>
>>>
>>>
>>> On Wed, Mar 30, 2011 at 4:45 PM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
>>>> Try this:
>>>>
>>>> library(zoo)
>>>> wk <- as.numeric(format(myframe$dates, '%W'))
>>>> is.na(wk) <- wk == 0
>>>> aggregate(value ~ group + na.locf(wk), myframe, FUN = sum)
>>>>
>>>>
>>>>
>>>> On Wed, Mar 30, 2011 at 4:35 PM, Dimitri Liakhovitski
>>>> <dimitri.liakhovitski at gmail.com> wrote:
>>>>> Henrique, this is great, thank you!
>>>>>
>>>>> It's almost what I was looking for! Only one small thing - it doesn't
>>>>> "merge" the results for weeks that "straddle" 2 years. In my example -
>>>>> last week of year 2008 and the very first week of 2009 are one week.
>>>>> Any way to "join them"?
>>>>> Asking because in reality I'll have many years and hundreds of groups
>>>>> - hence, it'll be hard to do it manually.
>>>>>
>>>>>
>>>>> BTW - does format(dates,"%Y.%W") always consider weeks as starting with Mondays?
>>>>>
>>>>> Thank you very much!
>>>>> Dimitri
>>>>>
>>>>>
>>>>> On Wed, Mar 30, 2011 at 2:55 PM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
>>>>>> Try this:
>>>>>>
>>>>>> aggregate(value ~ group + format(dates, "%Y.%W"), myframe, FUN = sum)
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 30, 2011 at 11:23 AM, Dimitri Liakhovitski
>>>>>> <dimitri.liakhovitski at gmail.com> wrote:
>>>>>>> Dear everybody,
>>>>>>>
>>>>>>> I have the following challenge. I have a data set with 2 subgroups,
>>>>>>> dates (days), and corresponding values (see example code below).
>>>>>>> Within each subgroup: I need to aggregate (sum) the values by week -
>>>>>>> for weeks that start on a Monday (for example, 2008-12-29 was a
>>>>>>> Monday).
>>>>>>> I find it difficult because I have missing dates in my data - so that
>>>>>>> sometimes I don't even have the date for some Mondays. So, I can't
>>>>>>> write a proper loop.
>>>>>>> I want my output to look something like this:
>>>>>>> group   dates   value
>>>>>>> group.1 2008-12-29  3.0937
>>>>>>> group.1 2009-01-05  3.8833
>>>>>>> group.1 2009-01-12  1.362
>>>>>>> ...
>>>>>>> group.2 2008-12-29  2.250
>>>>>>> group.2 2009-01-05  1.4057
>>>>>>> group.2 2009-01-12  3.4411
>>>>>>> ...
>>>>>>>
>>>>>>> Thanks a lot for your suggestions! The code is below:
>>>>>>> Dimitri
>>>>>>>
>>>>>>> ### Creating example data set:
>>>>>>> mydates<-rep(seq(as.Date("2008-12-29"), length = 43, by = "day"),2)
>>>>>>> myfactor<-c(rep("group.1",43),rep("group.2",43))
>>>>>>> set.seed(123)
>>>>>>> myvalues<-runif(86,0,1)
>>>>>>> myframe<-data.frame(dates=mydates,group=myfactor,value=myvalues)
>>>>>>> (myframe)
>>>>>>> dim(myframe)
>>>>>>>
>>>>>>> ## Removing same rows (dates) unsystematically:
>>>>>>> set.seed(123)
>>>>>>> removed.group1<-sample(1:43,size=11,replace=F)
>>>>>>> set.seed(456)
>>>>>>> removed.group2<-sample(44:86,size=11,replace=F)
>>>>>>> to.remove<-c(removed.group1,removed.group2);length(to.remove)
>>>>>>> to.remove<-to.remove[order(to.remove)]
>>>>>>> myframe<-myframe[-to.remove,]
>>>>>>> (myframe)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Dimitri Liakhovitski
>>>>>>> Ninah Consulting
>>>>>>> www.ninah.com
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Henrique Dallazuanna
>>>>>> Curitiba-Paraná-Brasil
>>>>>> 25° 25' 40" S 49° 16' 22" O
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Dimitri Liakhovitski
>>>>> Ninah Consulting
>>>>> www.ninah.com
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Henrique Dallazuanna
>>>> Curitiba-Paraná-Brasil
>>>> 25° 25' 40" S 49° 16' 22" O
>>>>
>>>
>>>
>>>
>>> --
>>> Dimitri Liakhovitski
>>> Ninah Consulting
>>> www.ninah.com
>>>
>>
>>
>>
>> --
>> Henrique Dallazuanna
>> Curitiba-Paraná-Brasil
>> 25° 25' 40" S 49° 16' 22" O
>>
>
>
>
> --
> Dimitri Liakhovitski
> Ninah Consulting
> www.ninah.com
>

-- 
Dimitri Liakhovitski
Ninah Consulting
www.ninah.com