[R] R - Aggregate 3-Hourly Block Data into Weekly (Melt)

Sun Dec 21 16:26:19 CET 2014

Apologies for re-posting. Any suggestions for clipping/subsetting panel
data (ID=US counties, Time=Week)? I would like start the data to start on
the first Monday and end on the last Sunday for the time period 2026-2045.
Thanks and apologies again.

Sincerely,

Shouro

On Fri, Dec 19, 2014 at 11:01 AM, Shouro Dasgupta <shouro at gmail.com> wrote:

> Thank you very much for your reply. I really appreciate it. I apologize
> for the HTML version, I have made modifications and replied to your
> questions/comments below. Thanks again
>
>
>
> tmp1 <- structure(list(FIPS = c(1001L, 1003L, 1005L), X2026.01.01.1 = c(
> 285.5533142,
>
>   285.5533142, 286.2481079), X2026.01.01.2 = c(283.4977112, 283.4977112,
>
>   285.0860291), X2026.01.01.3 = c(281.9733887, 281.9733887, 284.1548767
>
>   ), X2026.01.01.4 = c(280.0234985, 280.0234985, 282.6075745),
>
>       X2026.01.01.5 = c(278.7125854, 278.7125854, 281.2553711),
>
>       X2026.01.01.6 = c(278.5204773, 278.5204773, 280.6148071),
> X2026.01.01.7 = c(282.3938, 282.3938, 283.1096), X2026.01.01.8 = c(
> 285.9133, 285.9133, 286.1951) .Names = c("FIPS", "X2026.01.01.1",
> "X2026.01.01.2", "X2026.01.01.3", "X2026.01.01.4",
>
>   "X2026.01.01.5", "X2026.01.01.6", "X2026.01.01.7", "X2026.01.01.8" ),
> class = "data.frame", row.names = c(NA, -3L))
>
>
>
> *Looks like this*
>
> FIPS
>
> X2026.01.01.1
>
> X2026.01.01.2
>
> X2026.01.01.3
>
> X2026.01.01.4
>
> X2026.01.01.5
>
> X2026.01.01.6
>
> X2026.01.01.7
>
> X2026.01.01.8
>
> 1001
>
> 285.5533
>
> 283.4977
>
> 281.9734
>
> 280.0235
>
> 278.7126
>
> 278.5205
>
> 282.3938
>
> 285.9133
>
> 1003
>
> 285.5533
>
> 283.4977
>
> 281.9734
>
> 280.0235
>
> 278.7126
>
> 278.5205
>
> 282.3938
>
> 285.9133
>
> 1005
>
> 286.2481
>
> 285.086
>
> 284.1549
>
> 282.6076
>
> 281.2554
>
> 280.6148
>
> 283.1096
>
> 286.1951
>
>
>
> For X2026.01.01.1 represents Year=2026, Month=01, Day=01, Hour block=1.
>
> I have extracted the data by FIPS code and reshaped the yearly data files
> using melt();
>
>
>
> for (i in filelist) {
>
>   tmp1 <- as.data.table(read.csv(i,header=T, sep=","))
>
>   tmp2 <- melt(tmp1, id="FIPS")
>
>   tmp2$year <- as.numeric(substr(tmp2$variable,2,5))
>
>   tmp2$month <- as.numeric(substr(tmp2$variable,7,8))
>
>   tmp2$day <- as.numeric(substr(tmp2$variable,10,11))
>
> }
>
>
> I have added date string and weekdays using the following codes:
>
> *Date Variable*
>
>         tmp2$date <- with(tmp2, ymd(sprintf('%04d%02d%02d', year, month,
> day)))
>
>
>
> *Day Variable*
>
>         tmp2$day <- weekdays(as.Date(tmp2$date))
>
>
>
> *Question 1: *Apologies for clipping the data and not showing all the
> hour blocks. I have included a full 8-hour block now. For each year, I have
> 3-hour blocks for every day.
>
>
>
> *Question 2: *I have two time periods for each GCM; 2026-2045 and
> 2081-2100. There are occasions when days would have 7 hour blocks instead
> of 8, it could be a data reporting issue from the models.
>
> *Reply to Comment 1: *The data has been downscaled to the US from gridded
> data; Resolution: T42 in atm. 1/3~1ºlat. x 1ºlon. tripolar grids in ocn. So
> Daylight Savings Time could well be an issue. I will look into it.
>
>
>
> *Reply to Comment 2: * Thank you for the suggestion. I realize that using
> rep() is risky, however, I have assign week numbers to each FIPS code (ID)
> for each year. I was thinking of something similar to this:
>
>
>
> weeks<-rep(seq(1,52,1),each=(unique(tmp2$FIPS)**8*)
>
>
>
> Any alternative code will be highly appreciated. My first goal is to
> subset/clip the data to begin on the first Monday and end on the last
> Sunday of each year.
>
> On Fri, Dec 19, 2014 at 9:37 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
> wrote:
>>
>> Thank you for attempting to convey your problem clearly using example
>> code... but your use of HTML email has very nearly undone all your
>> efforts.  Also, use of "dput" to make an R-readable block of data is more
>> reliable than read.table to get the data into our R sessions quickly.
>>
>> First question: you say these are three-hour results, but there are only
>> six per day in your example.
>>
>> Second question: you say the "number of blocks vary between 7 and 8", but
>> your example data does not illustrate that problem. (If there were 8 blocks
>> per day the three-hour statement would make more sense.)
>>
>> Comment: You have not mentioned timezone information... this information
>> looks ripe for GMT, but if that is a bad assumption then daylight savings
>> might account for some variations in blocks per day.
>>
>> Comment: I think your plan of using rep to identify the week numbers is
>> risky. I would recommend using a Date or POSIXt type to make the timestamps
>> computable, and then find the date corresponding to the beginning of the
>> week that the timestamp falls into. Then aggregate grouping on those time
>> values. Unfortunately, the specific way you go about identifying the
>> beginning of week may depend on the timezone information.
>>
>>
>> On Thu, 18 Dec 2014, Shouro Dasgupta wrote:
>>
>>  I am trying to compute max, min, and mean from Global Circulation Models
>>> (GCM) for the US. The data is in 3-hour blocks for 2026-2045 and
>>> 2081-2100.
>>> Sample Data:
>>>
>>> tmp1 <- structure(list(FIPS = c(1001L, 1003L, 1005L), X2026.01.01.1 =
>>> c(285.5533142,
>>>  285.5533142, 286.2481079), X2026.01.01.2 = c(283.4977112, 283.4977112,
>>>  285.0860291), X2026.01.01.3 = c(281.9733887, 281.9733887, 284.1548767
>>>  ), X2026.01.01.4 = c(280.0234985, 280.0234985, 282.6075745),
>>>      X2026.01.01.5 = c(278.7125854, 278.7125854, 281.2553711),
>>>      X2026.01.01.6 = c(278.5204773, 278.5204773, 280.6148071)),
>>> .Names = c("FIPS",
>>>  "X2026.01.01.1", "X2026.01.01.2", "X2026.01.01.3", "X2026.01.01.4",
>>>  "X2026.01.01.5", "X2026.01.01.6"), class = "data.frame", row.names =
>>> c(NA,
>>>  -3L))
>>>
>>> I have extracted the data by FIPS code and reshaped the yearly data files
>>> using melt();
>>>
>>> for (i in filelist) {
>>>  tmp1 <- as.data.table(read.csv(i,header=T, sep=","))
>>>  tmp2 <- melt(tmp1, id="FIPS")
>>>  tmp2$year <- as.numeric(substr(tmp2$variable,2,5))
>>>  tmp2$month <- as.numeric(substr(tmp2$variable,7,8))
>>>  tmp2$day <- as.numeric(substr(tmp2$variable,10,11))}
>>>
>>> I have added datestring and weekdays using the following code:
>>> Inserting Date Variable
>>>
>>> tmp2$date <- with(tmp2, ymd(sprintf('%04d%02d%02d', year, month, day)))
>>>
>>> Inserting Day Variable
>>>
>>> tmp2$day <- weekdays(as.Date(tmp2$date))
>>>
>>> sample.tmp2 <- "FIPS         xdate     temp year month      day
>>> date      dates weekdays
>>> + 5599311  1003 X2045.08.14.2 304.5995 2045     8   Monday 2045-08-14
>>> 2036-01-29        2
>>> + 468406 39093 X2045.01.19.7 267.8483 2045     1 Thursday 2045-01-19
>>> 2028-06-04        0
>>> + 5022078 21167 X2045.07.21.8 314.6772 2045     7   Friday 2045-07-21
>>> 2035-09-13        4
>>> + 186822   9005 X2045.01.08.5 269.0803 2045     1   Sunday 2045-01-08
>>> 2037-06-28        0
>>> + 3998678 13295 X2045.06.10.7 307.2408 2045     6 Saturday 2045-06-10
>>> 2033-10-13        4"
>>>
>>> Data <- read.table(text=sample.tmp2, header = TRUE)
>>>
>>> My goal is to aggregate these 3-hourly blocks into weekly data, however,
>>> GCM data is not consistent and the blocks vary between 7 and 8. I want to
>>> clip the data to start on the first Monday of 2026 and end on the last
>>> Sunday of 2045 and then use rep() to assign week numbers for the whole
>>> epoch.
>>>
>>> I know I can count the number of each day using something like this;
>>>
>>> length(which(weekdays == '0'))
>>>
>>> Where 0, 1, 2..., 6 represent Sunday, Monday,...
>>>
>>> My question is am I doing anything wrong in trying to aggregate the data
>>> to
>>> begin with? But importantly, I would be grateful for any help to clip the
>>> dataset to begin on the first Monday and end on the last Sunday. Thank
>>> you
>>> very much!
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/
>>> posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> ------------------------------------------------------------
>> ---------------
>> Jeff Newmiller                        The     .....       .....  Go
>> Live...
>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>> Go...
>>                                       Live:   OO#.. Dead: OO#..  Playing
>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>> /Software/Embedded Controllers)               .OO#.       .OO#.
>> rocks...1k
>> ------------------------------------------------------------
>> ---------------
>>
>

	[[alternative HTML version deleted]]