[R] average at specific hour "endpoints" of the day

Massimo Bressan massimo.bressan at arpa.veneto.it
Fri Apr 7 16:02:22 CEST 2017


hi jeff

thank you for your code, there is lot to think about it...

In the meanwhile I've managed to work out a (sort of) solution but I'm still not completely satisfied with it

I would like to keep it all more elegant and possibly general

here it is, so far...

####

mydate<-seq(ISOdatetime(2017,1, 1, 0, 0, 0), by="hour", length.out = 48)
v1<-1:48
mydf<-data.frame(mydate,v1)

library(zoo)

z<-zoo(mydf[,-1], mydf[,1])

z8<-rollapply(z, width=8, FUN=mean, align="right")
iz8<-which(as.numeric(strftime(index(z8), '%H'))==6)
z8<-z8[iz8]

z16<-rollapply(z, width=16, FUN=mean, align="right")
iz16<-which(as.numeric(strftime(index(z16), '%H'))==22)
z16<-z16[iz16]

fortify.zoo(z16)
fortify.zoo(z8)

#and then any sort of manipulation with dataframes

####

bye

----- Messaggio originale -----
Da: "Jeff Newmiller" <jdnewmil at dcn.davis.ca.us>
A: "Massimo Bressan" <massimo.bressan at arpa.veneto.it>
Cc: "r-help" <r-help at r-project.org>
Inviato: Giovedì, 6 aprile 2017 18:19:29
Oggetto: Re: [R] average at specific hour "endpoints" of the day

On Thu, 6 Apr 2017, Massimo Bressan wrote:

> hello
>
> given my reproducible example
>
> #---
> date<-seq(ISOdate(2017,1, 1, 0), by="hour", length.out = 48)
> v1<-1:48
> df<-data.frame(date,v1)
>
> #--

"date" and "df" are functions in base R... best to avoid hiding them by 
re-using those names in the global environment

ISOdate forces GMT, which many data sets that you might work with do NOT 
use. It is better to use ISOdatetime to avoid letting hidden code 
determine the timezone that is applied to (or compared with) your data.

>
> I need to calculate the average of variable v1 at specific hour "endpoints" of the day: i.e. at hours 6.00 and 22.00 respectively
>
> the desired result is
>
> date v1
> 01/01/17 22:00 15.5
> 02/01/17 06:00 27.5
> 02/01/17 22:00 39.5
>
> at hour 06:00 of each day the average is calculated by considering the 8 previous records (hours from 23:00 to 6:00)
> at hour 22:00 of each day the average is calculated by considering the 16 previous records (hours from 7:00 to 22:00)
>
> any hint please?
>
> I've been trying with some functions within the "xts" package but withouth much result...

I am not sure how I would do this with xts, but the below code is one 
fairly literal approach (implemented two ways) to translate your 
requirements that is also potentially extensible if the data or 
requirements change.

### Base R....

Sys.setenv( TZ = "Etc/GMT+5" ) # selected arbitrarily here but not left to
                                # the system to decide
dta <- data.frame( datetime = seq( ISOdatetime( 2017,1, 1, 0, 0, 0 )
                                  , by="hour"
                                  , length.out = 48
                                  )
                  , v1 = 1:48
                  )
dta$nrec <- 1
dta$date <- as.POSIXct( trunc.POSIXt( dta$datetime, units="days" ) )
dta$tod <- as.numeric( dta$datetime - dta$date, units = "hours" )
dta$timeslot <- factor( ifelse( 6 < dta$tod & dta$tod <= 22
                               , "Day"
                               , "Night"
                               )
                       , levels = c( "Night", "Day" )
                       )
dta$slotdatetime <- dta$date + as.difftime( ifelse( "Day" == dta$timeslot
                                                   , 22
                                                   , ifelse( 22 < dta$tod
                                                           , 24+6
                                                           , 6
                                                           )
                                                   )
                                           , units="hours"
                                           )
dta2 <- aggregate( dta[ , c( "v1", "nrec" ) ]
                  , dta[ , c( "timeslot", "slotdatetime" ), drop=FALSE ]
                  , FUN = sum
                  )
dta2 <- subset( dta2, nrec == ifelse( "Day"==timeslot, 16, 8 ) )
dta2$v1mean <- dta2$v1 / dta2$nrec

#### or if you don't mind the tidyverse....

library(dplyr) # wonderland of non-standard evaluation... beware, Alice!
Sys.setenv( TZ = "Etc/GMT+5" ) # selected arbitrarily here but not left to
                                # the system to decide
dta <- data.frame( datetime = seq( ISOdatetime( 2017,1, 1, 0, 0, 0 )
                                  , by="hour"
                                  , length.out = 48
                                  )
                  , v1 = 1:48
                  )
dta2 <- (   dta
         %>% mutate( date = as.POSIXct( trunc.POSIXt( datetime
                                                    , units="days"
                                                    )
                                      )
                   , tod = as.numeric( datetime - date, units = "hours" )
                   , timeslot = factor( ifelse( 6 < tod & tod <= 22
                                              , "Day"
                                              , "Night"
                                              )
                                      , levels = c( "Night", "Day" )
                                      )
                   , slotdatetime = date +
                            as.difftime( ifelse( "Day" == timeslot
                                               , 22
                                               , ifelse( 22 < tod
                                                       , 24+6
                                                       , 6
                                                       )
                                               )
                                       , units="hours"
                                       )
                   )
         %>% group_by( slotdatetime, timeslot )
         %>% summarise( v1mean = mean( v1 )
                      , nrec = n()
                      )
         %>% filter( nrec == ifelse( "Day"==timeslot, 16, 8 ) )
         )




> thanks for the help
> 	[[alternative HTML version deleted]]

This is a plain-text mailing list. Your chances of communicating 
successfully when you post HTML format email are much worse than if you 
post plain text using the "plain text" option in your mail program.

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k



More information about the R-help mailing list