[R] weighted average grouped by variables
Massimo Bressan
massimo.bressan at arpa.veneto.it
Thu Nov 9 14:16:33 CET 2017
Hello
an update about my question: I worked out the following solution (with the package "dplyr")
library(dplyr)
mydf%>%
mutate(speed_vehicles=n_vehicles*mydf$speed) %>%
group_by(date_time,type) %>%
summarise(
sum_n_times_speed=sum(speed_vehicles),
n_vehicles=sum(n_vehicles),
vel=sum(speed_vehicles)/sum(n_vehicles)
)
In fact I was hoping to manage everything in a "one-go": i.e. without the need to create the "intermediate" variable called "speed_vehicles" and with the use of the function weighted.mean()
any hints for a different approach much appreciated
thanks
Da: "Massimo Bressan" <massimo.bressan at arpa.veneto.it>
A: "r-help" <r-help at r-project.org>
Inviato: Giovedì, 9 novembre 2017 12:20:52
Oggetto: weighted average grouped by variables
hi all
I have this dataframe (created as a reproducible example)
mydf<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""),
direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
type = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"),
avg_speed = c(41.1029082774049, 40.3333333333333, 40.3157894736842, 36.0869565217391, 33.4065155807365, 37.6222222222222, 35.5),
n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)),
.Names = c("date_time", "direction", "type", "speed", "n_vehicles"),
row.names = c(NA, -7L),
class = "data.frame")
mydf
and I need to get to this final result
mydf_final<-structure(list(date_time = structure(c(1508238000, 1508238000, 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""),
type = structure(c(1L, 2L, 3L, 4L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class = "factor"),
weighted_avg_speed = c(36.39029, 38.56521, 37.53333, 36.08696),
n_vehicles = c(1153L,69L,45L,23L)),
.Names = c("date_time", "type", "weighted_avg_speed", "n_vehicles"),
row.names = c(NA, -4L),
class = "data.frame")
mydf_final
my question:
how to compute a weighted mean i.e. "weighted_avg_speed"
from "speed" (the values whose weighted mean is to be computed) and "n_vehicles" (the weights)
grouped by "date_time" and "type"?
to be noted the complication of the case "motorcycle" (not present in both directions)
any help for that?
thank you
max
--
------------------------------------------------------------
Massimo Bressan
ARPAV
Agenzia Regionale per la Prevenzione e
Protezione Ambientale del Veneto
Dipartimento Provinciale di Treviso
Via Santa Barbara, 5/a
31100 Treviso, Italy
tel: +39 0422 558545
fax: +39 0422 558516
e-mail: massimo.bressan at arpa.veneto.it
------------------------------------------------------------
[[alternative HTML version deleted]]
More information about the R-help
mailing list