[R] How to select max data according to week?

Eric Berger er|cjberger @end|ng |rom gm@||@com
Wed Jun 19 16:45:08 CEST 2019


I just realized that my method as written only works if the entire date
range is within a single calendar year.
Here is a revised version that should handle the general case.

In this version I assume that the conversion from the original numbers to
dates has already been done
e.g. it can be done using ymd() as I showed before

library(dplyr)
library(lubridate)

## create some example data for 36 months from 2010
d <- seq(from=as.Date("2010-01-01"),to=as.Date("2012-12-31"),by="1 day")
conc <- runif(length(d)) # concentrations
df <- data.frame( dt=d, conc=conc)
df2 <- dplyr::mutate( df, wkpair=as.integer(floor((0:(nrow(df)-1))/14)) )
df3 <- dplyr::group_by(df2,wkpair) %>%
       dplyr::summarise(from=min(dt), to=max(dt), maxconc=max(conc)) %>%
       dplyr::select(from,to,maxconc)
df3

# A tibble: 79 x 3
   from       to         maxconc
   <date>     <date>       <dbl>
 1 2010-01-01 2010-01-14   0.928
 2 2010-01-15 2010-01-28   0.834
 3 2010-01-29 2010-02-11   0.989
 4 2010-02-12 2010-02-25   0.983
 5 2010-02-26 2010-03-11   0.898
 6 2010-03-12 2010-03-25   0.864
 7 2010-03-26 2010-04-08   0.920
 8 2010-04-09 2010-04-22   0.925
 9 2010-04-23 2010-05-06   0.998
10 2010-05-07 2010-05-20   0.761
# ... with 69 more rows

On Wed, Jun 19, 2019 at 5:08 PM Eric Berger <ericjberger using gmail.com> wrote:

> Hi Bert,
> I agree that our interpretation is slightly different, which is why I
> wrote:
> "If one can take the actual week of the year as an acceptable definition
> of week, then here's my approach."
>
> Regards,
> Eric
>
> On Wed, Jun 19, 2019 at 5:04 PM Bert Gunter <bgunter.4567 using gmail.com>
> wrote:
>
>> Eric:
>>
>> I believe you're doing something different than I did. I broke up each
>> month into biweekly periods, 2+ per month. You seem to be grouping the
>> overall entire period into biweekly intervals -- apologies if I'm wrong,
>> but if I understood correctly, that's not the same thing. I do not know
>> which of us -- if either -- has interpreted her query correctly.
>>
>> Cheers,
>> Bert
>>
>> On Wed, Jun 19, 2019 at 2:35 AM Eric Berger <ericjberger using gmail.com>
>> wrote:
>>
>>> Hi Siti,
>>> I didn't test Bert's code but I assume it's fine. :-)
>>> I would take a different approach than Bert. I prefer to use a package
>>> such as lubridate to handle the date wrangling, and a package such as dplyr
>>> to handle the grouping and max extraction.
>>> It may be overkill for this problem, but these are great packages to
>>> become familiar with.
>>> If one can take the actual week of the year as an acceptable definition
>>> of week, then here's my approach.
>>>
>>> library(lubridate)
>>> library(dplyr)
>>>
>>> # Step 1: start with Bert's code to create sample data
>>> ## create some example data for 3 months in 2000
>>> d<- 2e7 +c(113:131,201:228, 301:330) ## dates
>>> conc <- runif(length(d)) # concentrations
>>>
>>> # Step 2: collect the data into a data frame
>>> df <- data.frame( dt=d, conc=conc)
>>>
>>> # Step 3: use lubridate's ymd() function to parse the dates, its week()
>>> function to identify the week of the year, and define the new column
>>> 'wkpair' that groups the weeks 2-at-a-time
>>> df2 <- dplyr::mutate( df,
>>> wkpair=as.integer(floor(lubridate::week(lubridate::ymd(dt) )/2)) )
>>>
>>> # Step 4: group by the wkpair and use dplyr's summarise to get the info
>>> you wanted
>>> df3 <- dplyr::group_by(df2,wkpair) %>%
>>>           dplyr::summarise( from=min(dt), to=max(dt), maxconc=max(conc))
>>> %>%
>>>           dplyr::select(from,to,maxconc)
>>>
>>> df3
>>>
>>> # A tibble: 6 x 3
>>>       from       to maxconc
>>>      <dbl>    <dbl>   <dbl>
>>> 1 20000113 20000121   0.963
>>> 2 20000122 20000204   0.988
>>> 3 20000205 20000218   0.939
>>> 4 20000219 20000303   0.883
>>> 5 20000304 20000317   0.863
>>> 6 20000318 20000330   0.765
>>>
>>> HTH,
>>> Eric
>>>
>>>
>>>
>>> On Tue, Jun 18, 2019 at 9:39 PM Bert Gunter <bgunter.4567 using gmail.com>
>>> wrote:
>>>
>>>> My apologies. I negected to cc r-help. -- Bert
>>>>
>>>>
>>>>
>>>> On Tue, Jun 18, 2019 at 11:21 AM Bert Gunter <bgunter.4567 using gmail.com>
>>>> wrote:
>>>>
>>>> >
>>>> > I assume that 20000215 means year 2000, month 2, day 15.
>>>> > I also assume that you want maxes for the first 2 weeks of a month,
>>>> the
>>>> > second 2 weeks, and any remaining days.
>>>> > I also assume that this might be desired for arbitrary numbers of
>>>> years,
>>>> > months, and days.
>>>> >
>>>> > The following is one way to do this. As it's a slight pain to cut and
>>>> > paste email data as text into R (use ?dput or R code to run to provide
>>>> > example data instead), I just made up my own. You'll have to do the
>>>> > following within a data frame through extraction or by using with() of
>>>> > course.
>>>> >
>>>> > ## create some example data for 3 months in 2000
>>>> > d<- 2e7 +c(113:131,201:228, 301:330) ## dates
>>>> > conc <- runif(length(d)) # concentrations
>>>> >
>>>> > ## convert the date to character to extract year, month, and day
>>>> > cdate <- as.character(d)
>>>> > ## use substr to to the extraction
>>>> > year <- substr(cdate,1,4)
>>>> > mon <- substr(cdate,5,6)
>>>> > day <- substr(cdate, 7,8)
>>>> >
>>>> > ## convert day to numeric and use cut() to group into the biweekly
>>>> periods.
>>>> > d14 <- cut(as.numeric(day), c(0,14.5,28.5, 32))
>>>> >
>>>> > ## Use tapply() to create your desired table of results.
>>>> > tapply(conc, list(year, d14, mon), max, na.rm = TRUE)
>>>> >
>>>> > ## Results
>>>> >
>>>> > , , 01
>>>> >
>>>> >       (0,14.5] (14.5,28.5] (28.5,32]
>>>> > 2000 0.7357389   0.9655391 0.7962965
>>>> >
>>>> > , , 02
>>>> >
>>>> >       (0,14.5] (14.5,28.5] (28.5,32]
>>>> > 2000 0.8193979   0.9487207        NA
>>>> >
>>>> > , , 03
>>>> >
>>>> >       (0,14.5] (14.5,28.5] (28.5,32]
>>>> > 2000 0.9718919   0.9997093  0.168659
>>>> >
>>>> >
>>>> > Cheers,
>>>> > Bert
>>>> >
>>>> > Bert Gunter
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Tue, Jun 18, 2019 at 8:53 AM SITI AISYAH BINTI ZAKARIA <
>>>> > aisyahzakaria using unimap.edu.my> wrote:
>>>> >
>>>> >> Hi,
>>>> >>
>>>> >> I'm Aisyah..I have a problem to run my R coding. I want to select
>>>> maximum
>>>> >> value according to week.
>>>> >>
>>>> >> here is my data
>>>> >>
>>>> >> Date          O3_Conc
>>>> >> 20000101        0.033
>>>> >> 20000102        0.023
>>>> >> 20000103        0.025
>>>> >> 20000104        0.041
>>>> >> 20000105        0.063
>>>> >> 20000106        0.028
>>>> >> 20000107        0.068
>>>> >> 20000108        0.048
>>>> >> 20000109        0.037
>>>> >> 20000110        0.042
>>>> >> 20000111        0.027
>>>> >> 20000112        0.035
>>>> >> 20000113        0.063
>>>> >> 20000114        0.035
>>>> >> 20000115        0.042
>>>> >> 20000116        0.028
>>>> >>
>>>> >> I want to find the max value from column O3_Conc for only 14 days
>>>> that
>>>> >> refer to biweekly in month. And the next 14 days for the max value.
>>>> >>
>>>> >> I hope that I can get the result like this:
>>>> >>
>>>> >> Date                     Max O3_Conc
>>>> >> 20000101 - 20000114        0.068
>>>> >> 20000115 - 20000129        0.061
>>>> >>
>>>> >> I try many coding but still unavailable.
>>>> >>
>>>> >> this example my coding
>>>> >>
>>>> >> library(plyr)
>>>> >>       data.frame(CA0003)
>>>> >>
>>>> >>       # format weeks as per requirement (replace "00" with "52" and
>>>> >> adjust corresponding year)
>>>> >>       tmp <- list()
>>>> >>       tmp$y <- format(df$Date, format="%Y")
>>>> >>       tmp$w <- format(df$Date, format="%U")
>>>> >>       tmp$y[tmp$w=="00"] <-
>>>> as.character(as.numeric(tmp$y[tmp$w=="00"]) -
>>>> >> 14)
>>>> >>       tmp$w[tmp$w=="00"] <- "884"
>>>> >>       df$week <- paste(tmp$y, tmp$w, sep = "-")
>>>> >>
>>>> >>       # get summary
>>>> >>       df2 <- ddply(df, .(week),transform, O3_Conc=max(O3_Conc))
>>>> >>
>>>> >>       # include week ending date
>>>> >>       tmp$week.ending <- lapply(df2$week, function(x) rev(df[df$week
>>>> ==x,
>>>> >> "O3_Conc"])[[1]])
>>>> >>       df2$week.ending <- sapply(tmp$week.ending, max(O3_Conc, TRUE)
>>>> >>
>>>> >> output
>>>> >>         Site_Id Site_Location
>>>>  Date
>>>> >>       Year    O3_Conc Month   Day     week
>>>> >> 1       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>>> 20000101
>>>> >>       2000    0.033   1       1       NULL-NULL
>>>> >> 2       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>>> 20000102
>>>> >>       2000    0.023   1       2       NULL-NULL
>>>> >> 3       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>>> 20000103
>>>> >>       2000    0.025   1       3       NULL-NULL
>>>> >> 4       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>>> 20000104
>>>> >>       2000    0.041   1       4       NULL-NULL
>>>> >> 5       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>>> 20000105
>>>> >>       2000    0.063   1       5       NULL-NULL
>>>> >> 6       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>>> 20000106
>>>> >>       2000    0.028   1       6       NULL-NULL
>>>> >> 7       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>>> 20000107
>>>> >>       2000    0.068   1       7       NULL-NULL
>>>> >> 8       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>>> 20000108
>>>> >>       2000    0.048   1       8       NULL-NULL
>>>> >> 9       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>>> 20000109
>>>> >>       2000    0.037   1       9       NULL-NULL
>>>> >> 10      CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>>> 20000110
>>>> >>       2000    0.042   1       10      NULL-NULL
>>>> >> 11      CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai
>>>> 20000111
>>>> >>       2000    0.027   1       11      NULL-NULL
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> This message has been scanned by E.F.A. Project and is believed to be
>>>> >> clean.
>>>> >>
>>>> >> ______________________________________________
>>>> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> >> PLEASE do read the posting guide
>>>> >> http://www.R-project.org/posting-guide.html
>>>> >> and provide commented, minimal, self-contained, reproducible code.
>>>> >>
>>>> >
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>

	[[alternative HTML version deleted]]



More information about the R-help mailing list