[R] weighted average grouped by variables

On 9 Nov 2017, at 14:58, PIKAL Petr wrote:
>
> Hi
>
> Thanks for working example.
>
> you could use split/ lapply approach, however it is probably not much better than dplyr method.
>
> sapply(split(mydf, mydf\$type), function(speed, n_vehicles) sum(mydf\$speed*mydf\$n_vehicles)/sum(mydf\$n_vehicles))
> gives you averages
>

The result of this calculation is:

car light_duty heavy_duty motorcycle
36.54109   36.54109   36.54109   36.54109

But this doesn't give the same result as the dplyr method which is:

date_time       type      vel
<dttm>     <fctr>    <dbl>
1 2017-10-17 13:00:00        car 36.39029
2 2017-10-17 13:00:00 light_duty 38.56522
3 2017-10-17 13:00:00 heavy_duty 37.53333
4 2017-10-17 13:00:00 motorcycle 36.08696

The base R way of getting the result should be modified slightly into

sapply(split(mydf, mydf\$type), function(Z) sum(Z\$speed*Z\$n_vehicles)/sum(Z\$n_vehicles))

Calculations are done on the elements of the list provided by split.
The result now is:

car light_duty heavy_duty motorcycle
36.39029   38.56522   37.53333   36.08696

Obviously now the same as the dplyr method.

Berend Hasselman

> aggregate(mydf\$n_vehicles, list(mydf\$type), sum)\$x
> gives you sums
>
> Cheers
> Petr
>
>>
>> Hello
>>
>> an update about my question: I worked out the following solution (with the
>> package "dplyr")
>>
>> library(dplyr)
>>
>> mydf%>%
>> mutate(speed_vehicles=n_vehicles*mydf\$speed) %>%
>> group_by(date_time,type) %>%
>> summarise(
>> sum_n_times_speed=sum(speed_vehicles),
>> n_vehicles=sum(n_vehicles),
>> vel=sum(speed_vehicles)/sum(n_vehicles)
>> )
>>
>>
>> In fact I was hoping to manage everything in a "one-go": i.e. without the need
>> to create the "intermediate" variable called "speed_vehicles" and with the use
>> of the function weighted.mean()
>>
>> any hints for a different approach much appreciated
>>
>> thanks
>>
>>
>>
>>
>> hi all
>>
>> I have this dataframe (created as a reproducible example)
>>
>> mydf<-structure(list(date_time = structure(c(1508238000, 1508238000,
>> 1508238000, 1508238000, 1508238000, 1508238000, 1508238000), class =
>> c("POSIXct", "POSIXt"), tzone = ""), direction = structure(c(1L, 1L, 1L, 1L, 2L, 2L,
>> 2L), .Label = c("A", "B"), class = "factor"), type = structure(c(1L, 2L, 3L, 4L, 1L,
>> 2L, 3L), .Label = c("car", "light_duty", "heavy_duty", "motorcycle"), class =
>> "factor"), avg_speed = c(41.1029082774049, 40.3333333333333,
>> 40.3157894736842, 36.0869565217391, 33.4065155807365,
>> 37.6222222222222, 35.5), n_vehicles = c(447L, 24L, 19L, 23L, 706L, 45L, 26L)),
>> .Names = c("date_time", "direction", "type", "speed", "n_vehicles"), row.names
>> = c(NA, -7L), class = "data.frame")
>>
>> mydf
>>
>> and I need to get to this final result
>>
>> mydf_final<-structure(list(date_time = structure(c(1508238000, 1508238000,
>> 1508238000, 1508238000), class = c("POSIXct", "POSIXt"), tzone = ""), type =
>> structure(c(1L, 2L, 3L, 4L), .Label = c("car", "light_duty", "heavy_duty",
>> "motorcycle"), class = "factor"), weighted_avg_speed = c(36.39029, 38.56521,
>> 37.53333, 36.08696), n_vehicles = c(1153L,69L,45L,23L)), .Names =
>> c("date_time", "type", "weighted_avg_speed", "n_vehicles"), row.names =
>> c(NA, -4L), class = "data.frame")
>>
>> mydf_final
>>
>>
>> my question:
>> how to compute a weighted mean i.e. "weighted_avg_speed"
>> from "speed" (the values whose weighted mean is to be computed) and
>> "n_vehicles" (the weights) grouped by "date_time" and "type"?
>>
>> to be noted the complication of the case "motorcycle" (not present in both
>> directions)
>>
>> any help for that?
>>
>> thank you
>>
>> max
>>
>>
>>
>>
>>
>>
>
