[R] How to select max data according to week?

Eric Berger er|cjberger @end|ng |rom gm@||@com
Wed Jun 19 11:35:19 CEST 2019


Hi Siti,
I didn't test Bert's code but I assume it's fine. :-)
I would take a different approach than Bert. I prefer to use a package such
as lubridate to handle the date wrangling, and a package such as dplyr to
handle the grouping and max extraction.
It may be overkill for this problem, but these are great packages to become
familiar with.
If one can take the actual week of the year as an acceptable definition of
week, then here's my approach.

library(lubridate)
library(dplyr)

# Step 1: start with Bert's code to create sample data
## create some example data for 3 months in 2000
d<- 2e7 +c(113:131,201:228, 301:330) ## dates
conc <- runif(length(d)) # concentrations

# Step 2: collect the data into a data frame
df <- data.frame( dt=d, conc=conc)

# Step 3: use lubridate's ymd() function to parse the dates, its week()
function to identify the week of the year, and define the new column
'wkpair' that groups the weeks 2-at-a-time
df2 <- dplyr::mutate( df,
wkpair=as.integer(floor(lubridate::week(lubridate::ymd(dt) )/2)) )

# Step 4: group by the wkpair and use dplyr's summarise to get the info you
wanted
df3 <- dplyr::group_by(df2,wkpair) %>%
          dplyr::summarise( from=min(dt), to=max(dt), maxconc=max(conc)) %>%
          dplyr::select(from,to,maxconc)

df3

# A tibble: 6 x 3
      from       to maxconc
     <dbl>    <dbl>   <dbl>
1 20000113 20000121   0.963
2 20000122 20000204   0.988
3 20000205 20000218   0.939
4 20000219 20000303   0.883
5 20000304 20000317   0.863
6 20000318 20000330   0.765

HTH,
Eric



On Tue, Jun 18, 2019 at 9:39 PM Bert Gunter <bgunter.4567 using gmail.com> wrote:

> My apologies. I negected to cc r-help. -- Bert
>
>
>
> On Tue, Jun 18, 2019 at 11:21 AM Bert Gunter <bgunter.4567 using gmail.com>
> wrote:
>
> >
> > I assume that 20000215 means year 2000, month 2, day 15.
> > I also assume that you want maxes for the first 2 weeks of a month, the
> > second 2 weeks, and any remaining days.
> > I also assume that this might be desired for arbitrary numbers of years,
> > months, and days.
> >
> > The following is one way to do this. As it's a slight pain to cut and
> > paste email data as text into R (use ?dput or R code to run to provide
> > example data instead), I just made up my own. You'll have to do the
> > following within a data frame through extraction or by using with() of
> > course.
> >
> > ## create some example data for 3 months in 2000
> > d<- 2e7 +c(113:131,201:228, 301:330) ## dates
> > conc <- runif(length(d)) # concentrations
> >
> > ## convert the date to character to extract year, month, and day
> > cdate <- as.character(d)
> > ## use substr to to the extraction
> > year <- substr(cdate,1,4)
> > mon <- substr(cdate,5,6)
> > day <- substr(cdate, 7,8)
> >
> > ## convert day to numeric and use cut() to group into the biweekly
> periods.
> > d14 <- cut(as.numeric(day), c(0,14.5,28.5, 32))
> >
> > ## Use tapply() to create your desired table of results.
> > tapply(conc, list(year, d14, mon), max, na.rm = TRUE)
> >
> > ## Results
> >
> > , , 01
> >
> >       (0,14.5] (14.5,28.5] (28.5,32]
> > 2000 0.7357389   0.9655391 0.7962965
> >
> > , , 02
> >
> >       (0,14.5] (14.5,28.5] (28.5,32]
> > 2000 0.8193979   0.9487207        NA
> >
> > , , 03
> >
> >       (0,14.5] (14.5,28.5] (28.5,32]
> > 2000 0.9718919   0.9997093  0.168659
> >
> >
> > Cheers,
> > Bert
> >
> > Bert Gunter
> >
> >
> >
> >
> > On Tue, Jun 18, 2019 at 8:53 AM SITI AISYAH BINTI ZAKARIA <
> > aisyahzakaria using unimap.edu.my> wrote:
> >
> >> Hi,
> >>
> >> I'm Aisyah..I have a problem to run my R coding. I want to select
> maximum
> >> value according to week.
> >>
> >> here is my data
> >>
> >> Date          O3_Conc
> >> 20000101        0.033
> >> 20000102        0.023
> >> 20000103        0.025
> >> 20000104        0.041
> >> 20000105        0.063
> >> 20000106        0.028
> >> 20000107        0.068
> >> 20000108        0.048
> >> 20000109        0.037
> >> 20000110        0.042
> >> 20000111        0.027
> >> 20000112        0.035
> >> 20000113        0.063
> >> 20000114        0.035
> >> 20000115        0.042
> >> 20000116        0.028
> >>
> >> I want to find the max value from column O3_Conc for only 14 days that
> >> refer to biweekly in month. And the next 14 days for the max value.
> >>
> >> I hope that I can get the result like this:
> >>
> >> Date                     Max O3_Conc
> >> 20000101 - 20000114        0.068
> >> 20000115 - 20000129        0.061
> >>
> >> I try many coding but still unavailable.
> >>
> >> this example my coding
> >>
> >> library(plyr)
> >>       data.frame(CA0003)
> >>
> >>       # format weeks as per requirement (replace "00" with "52" and
> >> adjust corresponding year)
> >>       tmp <- list()
> >>       tmp$y <- format(df$Date, format="%Y")
> >>       tmp$w <- format(df$Date, format="%U")
> >>       tmp$y[tmp$w=="00"] <- as.character(as.numeric(tmp$y[tmp$w=="00"])
> -
> >> 14)
> >>       tmp$w[tmp$w=="00"] <- "884"
> >>       df$week <- paste(tmp$y, tmp$w, sep = "-")
> >>
> >>       # get summary
> >>       df2 <- ddply(df, .(week),transform, O3_Conc=max(O3_Conc))
> >>
> >>       # include week ending date
> >>       tmp$week.ending <- lapply(df2$week, function(x) rev(df[df$week
> ==x,
> >> "O3_Conc"])[[1]])
> >>       df2$week.ending <- sapply(tmp$week.ending, max(O3_Conc, TRUE)
> >>
> >> output
> >>         Site_Id Site_Location                                       Date
> >>       Year    O3_Conc Month   Day     week
> >> 1       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai 20000101
> >>       2000    0.033   1       1       NULL-NULL
> >> 2       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai 20000102
> >>       2000    0.023   1       2       NULL-NULL
> >> 3       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai 20000103
> >>       2000    0.025   1       3       NULL-NULL
> >> 4       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai 20000104
> >>       2000    0.041   1       4       NULL-NULL
> >> 5       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai 20000105
> >>       2000    0.063   1       5       NULL-NULL
> >> 6       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai 20000106
> >>       2000    0.028   1       6       NULL-NULL
> >> 7       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai 20000107
> >>       2000    0.068   1       7       NULL-NULL
> >> 8       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai 20000108
> >>       2000    0.048   1       8       NULL-NULL
> >> 9       CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai 20000109
> >>       2000    0.037   1       9       NULL-NULL
> >> 10      CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai 20000110
> >>       2000    0.042   1       10      NULL-NULL
> >> 11      CA0003  Sek. Keb. Cederawasih, Taman Inderawasih, Perai 20000111
> >>       2000    0.027   1       11      NULL-NULL
> >>
> >>
> >>
> >>
> >> --
> >> This message has been scanned by E.F.A. Project and is believed to be
> >> clean.
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list