[R] Sum data according to date in sequence

@vi@e@gross m@iii@g oii gm@ii@com @vi@e@gross m@iii@g oii gm@ii@com
Sat Nov 4 18:47:53 CET 2023


There may be a point to consider about the field containing dates in the request below. Yes, much code will "work" just fine if the column  are is seen as text as you can group by that too. The results will perhaps not be in the order by row that you expected but you can do your re-sorting perhaps even more efficiently after your summarise() either by converting the fewer remaining rows to a form of date or by transforming the text dates into an order of year/month/date that then sorts properly in forward or reverse order as needed. 

Converting lots of rows to date is not a cheap process and grouping by that more complex date data structure may be harder. Heck, it may even make sense to use the text form of dates organized as a factor as the grouping becomes sort of pre-done.

The above comments are not saying any other solutions offered are wrong but simply discussing whether, especially for larger data sets, there are ways that could be more efficient.

-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Rui Barradas
Sent: Saturday, November 4, 2023 12:56 PM
To: roslinazairimah zakaria <roslinaump using gmail.com>; jim holtman <jholtman using gmail.com>
Cc: r-help mailing list <r-help using r-project.org>
Subject: Re: [R] Sum data according to date in sequence

Às 01:49 de 03/11/2023, roslinazairimah zakaria escreveu:
> Hi all,
> 
> This is the data:
> 
>> dput(head(dt1,20))structure(list(StationName = c("PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1"), date = c("1/14/2016", "1/14/2016",
> "1/14/2016", "1/15/2016", "1/15/2016", "1/15/2016", "1/15/2016",
> "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016",
> "1/16/2016", "1/16/2016", "1/17/2016", "1/17/2016", "1/17/2016",
> "1/17/2016", "1/17/2016", "1/18/2016"), time = c("12:09", "19:50",
> "20:22", "8:25", "14:23", "18:17", "21:46", "10:19", "12:12",
> "14:12", "16:22", "19:16", "19:19", "20:24", "9:54", "12:16",
> "13:53", "19:03", "22:00", "8:58"), EnergykWh = c(4.680496, 6.272414,
> 1.032782, 11.004884, 10.096824, 6.658797, 4.808874, 1.469384,
> 2.996239, 0.303222, 4.988339, 8.131804, 0.117156, 3.285669, 1.175608,
> 3.677487, 1.068393, 8.820755, 8.138583, 9.0575)), row.names = c(NA,
> 20L), class = "data.frame")
> 
> 
> I would like to sum EnergykW data by the date. E.g. all values for
> EnergykWh on 1/14/2016
> 
> 
> On Fri, Nov 3, 2023 at 8:10 AM jim holtman <jholtman using gmail.com> wrote:
> 
>> How about send a 'dput' of some sample data.  My guess is that your date
>> is 'character' and not 'Date'.
>>
>> Thanks
>>
>> Jim Holtman
>> *Data Munger Guru*
>>
>>
>> *What is the problem that you are trying to solve?Tell me what you want to
>> do, not how you want to do it.*
>>
>>
>> On Thu, Nov 2, 2023 at 4:24 PM roslinazairimah zakaria <
>> roslinaump using gmail.com> wrote:
>>
>>> Dear all,
>>>
>>> I have this set of data. I would like to sum the EnergykWh according date
>>> sequences.
>>>
>>>> head(dt1,20)                   StationName      date  time EnergykWh
>>> 1  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 12:09  4.680496
>>> 2  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 19:50  6.272414
>>> 3  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 20:22  1.032782
>>> 4  PALO ALTO CA / CAMBRIDGE #1 1/15/2016  8:25 11.004884
>>> 5  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 14:23 10.096824
>>> 6  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 18:17  6.658797
>>> 7  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 21:46  4.808874
>>> 8  PALO ALTO CA / CAMBRIDGE #1 1/16/2016 10:19  1.469384
>>> 9  PALO ALTO CA / CAMBRIDGE #1 1/16/2016 12:12  2.996239
>>> 10 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 14:12  0.303222
>>> 11 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 16:22  4.988339
>>> 12 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:16  8.131804
>>> 13 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:19  0.117156
>>> 14 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 20:24  3.285669
>>> 15 PALO ALTO CA / CAMBRIDGE #1 1/17/2016  9:54  1.175608
>>> 16 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 12:16  3.677487
>>> 17 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 13:53  1.068393
>>> 18 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 19:03  8.820755
>>> 19 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 22:00  8.138583
>>> 20 PALO ALTO CA / CAMBRIDGE #1 1/18/2016  8:58  9.057500
>>>
>>> I have tried this:
>>> library(dplyr)
>>> sums <- dt1 %>%
>>>    group_by(date) %>%
>>>    summarise(EnergykWh = sum(EnergykWh))
>>>
>>> head(sums,20)
>>>
>>> The date is not by daily sequence but by year sequence.
>>>
>>>> head(sums,20)# A tibble: 20 × 2
>>>     date      EnergykWh
>>>     <chr>         <dbl> 1 1/1/2017     25.3   2 1/1/2018     61.0   3
>>> 1/1/2019      0.627 4 1/1/2020     10.7   5 1/10/2017    69.4   6
>>> 1/10/2018    54.5   7 1/10/2019    49.1   8 1/10/2020    45.9   9
>>> 1/11/2017    73.9  10 1/11/2018    53.3  11 1/11/2019    93.5  12
>>> 1/11/2020    66.7  13 1/12/2017    78.6  14 1/12/2018    42.2  15
>>> 1/12/2019    22.7  16 1/12/2020    80.9  17 1/13/2017    85.6  18
>>> 1/13/2018    46.4  19 1/13/2019    40.0  20 1/13/2020   121.
>>>
>>>
>>>
>>> Thank you very much for any help given.
>>>
>>>
>>> --
>>> *Roslinazairimah Zakaria*
>>> *Tel: +609-5492370; Fax. No.+609-5492766*
>>>
>>> *Email: roslinazairimah using ump.edu.my <roslinazairimah using ump.edu.my>;
>>> roslinaump using gmail.com <roslinaump using gmail.com>*
>>> Faculty of Industrial Sciences & Technology
>>> University Malaysia Pahang
>>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
> 
Hello,

Here are two solutions.

1. Base R

Though I don't coerce the date column to class "Date", it seems to work.


aggregate(EnergykWh ~ date, dt1, sum)
#>        date EnergykWh
#> 1 1/14/2016  11.98569
#> 2 1/15/2016  32.56938
#> 3 1/16/2016  21.29181
#> 4 1/17/2016  22.88083
#> 5 1/18/2016   9.05750


2. Package dplyr.
First column date is coerced from class "character" to class "Date".
Then the grouped sums are computed.


suppressPackageStartupMessages(
   library(dplyr)
)

dt1 %>%
   mutate(date = as.Date(date, "%m/%d/%Y")) %>%
   summarise(EnergykWh = sum(EnergykWh), .by = date)
#>         date EnergykWh
#> 1 2016-01-14  11.98569
#> 2 2016-01-15  32.56938
#> 3 2016-01-16  21.29181
#> 4 2016-01-17  22.88083
#> 5 2016-01-18   9.05750


As you can see, the results are the same.

Also, this exact problem is one of the most asked on StackOverflow. 
Maybe you could try searching there for a solution. My code above is 
also exactly the code in [1], though I had already this answer written. 
I only checked after :(.


[1] 
https://stackoverflow.com/questions/61548758/r-how-sum-values-by-group-by-date


Hope this helps,

Rui Barradas



-- 
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
www.avg.com

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list