[R] How to select max data according to week?

Eric Berger er|cjberger @end|ng |rom gm@||@com
Thu Jun 20 09:21:14 CEST 2019


Marc - very nice.

On Wed, Jun 19, 2019 at 6:30 PM Marc Schwartz <marc_schwartz using me.com> wrote:

> Hi All,
>
> Is there a reason that the following was not considered?
>
> > DF
>          Date O3_Conc
> 1  2000-01-01   0.033
> 2  2000-01-02   0.023
> 3  2000-01-03   0.025
> 4  2000-01-04   0.041
> 5  2000-01-05   0.063
> 6  2000-01-06   0.028
> 7  2000-01-07   0.068
> 8  2000-01-08   0.048
> 9  2000-01-09   0.037
> 10 2000-01-10   0.042
> 11 2000-01-11   0.027
> 12 2000-01-12   0.035
> 13 2000-01-13   0.063
> 14 2000-01-14   0.035
> 15 2000-01-15   0.042
> 16 2000-01-16   0.028
>
> > str(DF)
> 'data.frame':   16 obs. of  2 variables:
>  $ Date   : Date, format: "2000-01-01" ...
>  $ O3_Conc: num  0.033 0.023 0.025 0.041 0.063 0.028 0.068 0.048 0.037
> 0.042 ...
>
>
> > aggregate(O3_Conc ~ cut(Date, breaks = "14 days"), data = DF, max)
>   cut(Date, breaks = "14 days") O3_Conc
> 1                    2000-01-01   0.068
> 2                    2000-01-15   0.042
>
>
> See ?cut.Date
>
> Note that there is an error in Aisyah's original post, in that the value
> for the max concentration in the second 14 day window is 0.042, not 0.061.
>
> Also, if one wanted to use actual calendar weeks, with a week starting on
> Sunday, as opposed to 14 day windows starting from the minimum date, you
> could use:
>
> > aggregate(O3_Conc ~ cut(Date, breaks = "2 weeks", start.on.monday =
> FALSE), data = DF, max)
>   cut(Date, breaks = "2 weeks", start.on.monday = FALSE) O3_Conc
> 1                                             1999-12-26   0.068
> 2                                             2000-01-09   0.063
>
>
> If desired, you could pre-create a column in the DF with the result of
> cut():
>
> DF$TimeStartDays <- cut(DF$Date, breaks = "14 days")
> DF$TimeStartWks <- cut(DF$Date, breaks = "2 weeks", start.on.monday =
> FALSE)
>
> > DF
>          Date O3_Conc TimeStartDays TimeStartWks
> 1  2000-01-01   0.033    2000-01-01   1999-12-26
> 2  2000-01-02   0.023    2000-01-01   1999-12-26
> 3  2000-01-03   0.025    2000-01-01   1999-12-26
> 4  2000-01-04   0.041    2000-01-01   1999-12-26
> 5  2000-01-05   0.063    2000-01-01   1999-12-26
> 6  2000-01-06   0.028    2000-01-01   1999-12-26
> 7  2000-01-07   0.068    2000-01-01   1999-12-26
> 8  2000-01-08   0.048    2000-01-01   1999-12-26
> 9  2000-01-09   0.037    2000-01-01   2000-01-09
> 10 2000-01-10   0.042    2000-01-01   2000-01-09
> 11 2000-01-11   0.027    2000-01-01   2000-01-09
> 12 2000-01-12   0.035    2000-01-01   2000-01-09
> 13 2000-01-13   0.063    2000-01-01   2000-01-09
> 14 2000-01-14   0.035    2000-01-01   2000-01-09
> 15 2000-01-15   0.042    2000-01-15   2000-01-09
> 16 2000-01-16   0.028    2000-01-15   2000-01-09
>
> and then use:
>
> > aggregate(O3_Conc ~ TimeStartDays, data = DF, max)
>   TimeStartDays O3_Conc
> 1    2000-01-01   0.068
> 2    2000-01-15   0.042
>
> > aggregate(O3_Conc ~ TimeStartWks, data = DF, max)
>   TimeStartWks O3_Conc
> 1   1999-12-26   0.068
> 2   2000-01-09   0.063
>
>
> Regards,
>
> Marc Schwartz
>
>
>
> > On Jun 19, 2019, at 10:45 AM, Eric Berger <ericjberger using gmail.com> wrote:
> >
> > I just realized that my method as written only works if the entire date
> > range is within a single calendar year.
> > Here is a revised version that should handle the general case.
> >
> > In this version I assume that the conversion from the original numbers to
> > dates has already been done
> > e.g. it can be done using ymd() as I showed before
> >
> > library(dplyr)
> > library(lubridate)
> >
> > ## create some example data for 36 months from 2010
> > d <- seq(from=as.Date("2010-01-01"),to=as.Date("2012-12-31"),by="1 day")
> > conc <- runif(length(d)) # concentrations
> > df <- data.frame( dt=d, conc=conc)
> > df2 <- dplyr::mutate( df, wkpair=as.integer(floor((0:(nrow(df)-1))/14)) )
> > df3 <- dplyr::group_by(df2,wkpair) %>%
> >       dplyr::summarise(from=min(dt), to=max(dt), maxconc=max(conc)) %>%
> >       dplyr::select(from,to,maxconc)
> > df3
> >
> > # A tibble: 79 x 3
> >   from       to         maxconc
> >   <date>     <date>       <dbl>
> > 1 2010-01-01 2010-01-14   0.928
> > 2 2010-01-15 2010-01-28   0.834
> > 3 2010-01-29 2010-02-11   0.989
> > 4 2010-02-12 2010-02-25   0.983
> > 5 2010-02-26 2010-03-11   0.898
> > 6 2010-03-12 2010-03-25   0.864
> > 7 2010-03-26 2010-04-08   0.920
> > 8 2010-04-09 2010-04-22   0.925
> > 9 2010-04-23 2010-05-06   0.998
> > 10 2010-05-07 2010-05-20   0.761
> > # ... with 69 more rows
> >
> > On Wed, Jun 19, 2019 at 5:08 PM Eric Berger <ericjberger using gmail.com>
> wrote:
> >
> >> Hi Bert,
> >> I agree that our interpretation is slightly different, which is why I
> >> wrote:
> >> "If one can take the actual week of the year as an acceptable definition
> >> of week, then here's my approach."
> >>
> >> Regards,
> >> Eric
> >>
> >> On Wed, Jun 19, 2019 at 5:04 PM Bert Gunter <bgunter.4567 using gmail.com>
> >> wrote:
> >>
> >>> Eric:
> >>>
> >>> I believe you're doing something different than I did. I broke up each
> >>> month into biweekly periods, 2+ per month. You seem to be grouping the
> >>> overall entire period into biweekly intervals -- apologies if I'm
> wrong,
> >>> but if I understood correctly, that's not the same thing. I do not know
> >>> which of us -- if either -- has interpreted her query correctly.
> >>>
> >>> Cheers,
> >>> Bert
> >>>
> >>> On Wed, Jun 19, 2019 at 2:35 AM Eric Berger <ericjberger using gmail.com>
> >>> wrote:
> >>>
> >>>> Hi Siti,
> >>>> I didn't test Bert's code but I assume it's fine. :-)
> >>>> I would take a different approach than Bert. I prefer to use a package
> >>>> such as lubridate to handle the date wrangling, and a package such as
> dplyr
> >>>> to handle the grouping and max extraction.
> >>>> It may be overkill for this problem, but these are great packages to
> >>>> become familiar with.
> >>>> If one can take the actual week of the year as an acceptable
> definition
> >>>> of week, then here's my approach.
> >>>>
> >>>> library(lubridate)
> >>>> library(dplyr)
> >>>>
> >>>> # Step 1: start with Bert's code to create sample data
> >>>> ## create some example data for 3 months in 2000
> >>>> d<- 2e7 +c(113:131,201:228, 301:330) ## dates
> >>>> conc <- runif(length(d)) # concentrations
> >>>>
> >>>> # Step 2: collect the data into a data frame
> >>>> df <- data.frame( dt=d, conc=conc)
> >>>>
> >>>> # Step 3: use lubridate's ymd() function to parse the dates, its
> week()
> >>>> function to identify the week of the year, and define the new column
> >>>> 'wkpair' that groups the weeks 2-at-a-time
> >>>> df2 <- dplyr::mutate( df,
> >>>> wkpair=as.integer(floor(lubridate::week(lubridate::ymd(dt) )/2)) )
> >>>>
> >>>> # Step 4: group by the wkpair and use dplyr's summarise to get the
> info
> >>>> you wanted
> >>>> df3 <- dplyr::group_by(df2,wkpair) %>%
> >>>>          dplyr::summarise( from=min(dt), to=max(dt),
> maxconc=max(conc))
> >>>> %>%
> >>>>          dplyr::select(from,to,maxconc)
> >>>>
> >>>> df3
> >>>>
> >>>> # A tibble: 6 x 3
> >>>>      from       to maxconc
> >>>>     <dbl>    <dbl>   <dbl>
> >>>> 1 20000113 20000121   0.963
> >>>> 2 20000122 20000204   0.988
> >>>> 3 20000205 20000218   0.939
> >>>> 4 20000219 20000303   0.883
> >>>> 5 20000304 20000317   0.863
> >>>> 6 20000318 20000330   0.765
> >>>>
> >>>> HTH,
> >>>> Eric
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Jun 18, 2019 at 9:39 PM Bert Gunter <bgunter.4567 using gmail.com>
> >>>> wrote:
> >>>>
> >>>>> My apologies. I negected to cc r-help. -- Bert
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, Jun 18, 2019 at 11:21 AM Bert Gunter <bgunter.4567 using gmail.com
> >
> >>>>> wrote:
> >>>>>
> >>>>>>
> >>>>>> I assume that 20000215 means year 2000, month 2, day 15.
> >>>>>> I also assume that you want maxes for the first 2 weeks of a month,
> >>>>> the
> >>>>>> second 2 weeks, and any remaining days.
> >>>>>> I also assume that this might be desired for arbitrary numbers of
> >>>>> years,
> >>>>>> months, and days.
> >>>>>>
> >>>>>> The following is one way to do this. As it's a slight pain to cut
> and
> >>>>>> paste email data as text into R (use ?dput or R code to run to
> provide
> >>>>>> example data instead), I just made up my own. You'll have to do the
> >>>>>> following within a data frame through extraction or by using with()
> of
> >>>>>> course.
> >>>>>>
> >>>>>> ## create some example data for 3 months in 2000
> >>>>>> d<- 2e7 +c(113:131,201:228, 301:330) ## dates
> >>>>>> conc <- runif(length(d)) # concentrations
> >>>>>>
> >>>>>> ## convert the date to character to extract year, month, and day
> >>>>>> cdate <- as.character(d)
> >>>>>> ## use substr to to the extraction
> >>>>>> year <- substr(cdate,1,4)
> >>>>>> mon <- substr(cdate,5,6)
> >>>>>> day <- substr(cdate, 7,8)
> >>>>>>
> >>>>>> ## convert day to numeric and use cut() to group into the biweekly
> >>>>> periods.
> >>>>>> d14 <- cut(as.numeric(day), c(0,14.5,28.5, 32))
> >>>>>>
> >>>>>> ## Use tapply() to create your desired table of results.
> >>>>>> tapply(conc, list(year, d14, mon), max, na.rm = TRUE)
> >>>>>>
> >>>>>> ## Results
> >>>>>>
> >>>>>> , , 01
> >>>>>>
> >>>>>>      (0,14.5] (14.5,28.5] (28.5,32]
> >>>>>> 2000 0.7357389   0.9655391 0.7962965
> >>>>>>
> >>>>>> , , 02
> >>>>>>
> >>>>>>      (0,14.5] (14.5,28.5] (28.5,32]
> >>>>>> 2000 0.8193979   0.9487207        NA
> >>>>>>
> >>>>>> , , 03
> >>>>>>
> >>>>>>      (0,14.5] (14.5,28.5] (28.5,32]
> >>>>>> 2000 0.9718919   0.9997093  0.168659
> >>>>>>
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Bert
> >>>>>>
> >>>>>> Bert Gunter
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Jun 18, 2019 at 8:53 AM SITI AISYAH BINTI ZAKARIA <
> >>>>>> aisyahzakaria using unimap.edu.my> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I'm Aisyah..I have a problem to run my R coding. I want to select
> >>>>> maximum
> >>>>>>> value according to week.
> >>>>>>>
> >>>>>>> here is my data
> >>>>>>>
> >>>>>>> Date          O3_Conc
> >>>>>>> 20000101        0.033
> >>>>>>> 20000102        0.023
> >>>>>>> 20000103        0.025
> >>>>>>> 20000104        0.041
> >>>>>>> 20000105        0.063
> >>>>>>> 20000106        0.028
> >>>>>>> 20000107        0.068
> >>>>>>> 20000108        0.048
> >>>>>>> 20000109        0.037
> >>>>>>> 20000110        0.042
> >>>>>>> 20000111        0.027
> >>>>>>> 20000112        0.035
> >>>>>>> 20000113        0.063
> >>>>>>> 20000114        0.035
> >>>>>>> 20000115        0.042
> >>>>>>> 20000116        0.028
> >>>>>>>
> >>>>>>> I want to find the max value from column O3_Conc for only 14 days
> >>>>> that
> >>>>>>> refer to biweekly in month. And the next 14 days for the max value.
> >>>>>>>
> >>>>>>> I hope that I can get the result like this:
> >>>>>>>
> >>>>>>> Date                     Max O3_Conc
> >>>>>>> 20000101 - 20000114        0.068
> >>>>>>> 20000115 - 20000129        0.061
> >>>>>>>
> >>>>>>> I try many coding but still unavailable.
> >>>>>>>
> >>>>>>> this example my coding
> >>>>>>>
> >>>>>>> library(plyr)
> >>>>>>>      data.frame(CA0003)
> >>>>>>>
> >>>>>>>      # format weeks as per requirement (replace "00" with "52" and
> >>>>>>> adjust corresponding year)
> >>>>>>>      tmp <- list()
> >>>>>>>      tmp$y <- format(df$Date, format="%Y")
> >>>>>>>      tmp$w <- format(df$Date, format="%U")
> >>>>>>>      tmp$y[tmp$w=="00"] <-
> >>>>> as.character(as.numeric(tmp$y[tmp$w=="00"]) -
> >>>>>>> 14)
> >>>>>>>      tmp$w[tmp$w=="00"] <- "884"
> >>>>>>>      df$week <- paste(tmp$y, tmp$w, sep = "-")
> >>>>>>>
> >>>>>>>      # get summary
> >>>>>>>      df2 <- ddply(df, .(week),transform, O3_Conc=max(O3_Conc))
> >>>>>>>
> >>>>>>>      # include week ending date
> >>>>>>>      tmp$week.ending <- lapply(df2$week, function(x) rev(df[df$week
> >>>>> ==x,
> >>>>>>> "O3_Conc"])[[1]])
> >>>>>>>      df2$week.ending <- sapply(tmp$week.ending, max(O3_Conc, TRUE)
> >>>>>>>
> >>>>>>> output
> >>>>>>>        Site_Id Site_Location
> >>>>> Date
> >>>>>>>      Year    O3_Conc Month   Day     week
> >>>>>>> 1       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000101
> >>>>>>>      2000    0.033   1       1       NULL-NULL
> >>>>>>> 2       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000102
> >>>>>>>      2000    0.023   1       2       NULL-NULL
> >>>>>>> 3       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000103
> >>>>>>>      2000    0.025   1       3       NULL-NULL
> >>>>>>> 4       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000104
> >>>>>>>      2000    0.041   1       4       NULL-NULL
> >>>>>>> 5       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000105
> >>>>>>>      2000    0.063   1       5       NULL-NULL
> >>>>>>> 6       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000106
> >>>>>>>      2000    0.028   1       6       NULL-NULL
> >>>>>>> 7       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000107
> >>>>>>>      2000    0.068   1       7       NULL-NULL
> >>>>>>> 8       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000108
> >>>>>>>      2000    0.048   1       8       NULL-NULL
> >>>>>>> 9       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000109
> >>>>>>>      2000    0.037   1       9       NULL-NULL
> >>>>>>> 10      CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000110
> >>>>>>>      2000    0.042   1       10      NULL-NULL
> >>>>>>> 11      CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
> >>>>> 20000111
> >>>>>>>      2000    0.027   1       11      NULL-NULL
> >>>>>>>
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list