---
title: "A real world example"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{A real world example}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
  %\VignetteDepends{dplyr, tidyr, stringr, lubridate}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This is practically the same code you can find on this blog post of mine:
https://www.brodrigues.co/blog/2018-11-14-luxairport/
but with some minor updates to reflect the current state of the `{tidyverse}` packages as well
as logging using `{chronicler}`.

Let's first load the required packages, and the `avia` dataset included in the `{chronicler}` package:

```{r setup}
library(chronicler)
library(dplyr)
library(tidyr)
library(stringr)
library(lubridate)

# Ensure chronicler version of `pick()` is being used
pick <- chronicler::pick

data("avia")
```

Now I need to define the needed functions for the analysis. To improve logging, I add the `dim()`
function as the `.g` argument of each function below. This will make it possible to see how the 
dimensions of the data change inside the pipeline:

```{r}
# Define required functions 
# You can use `record_many()` to avoid having to write everything

r_select <- record(select, .g = dim)
r_pivot_longer <- record(pivot_longer, .g = dim)
r_filter <- record(filter, .g = dim)
r_mutate <- record(mutate, .g = dim)
r_separate <- record(separate, .g = dim)
r_group_by <- record(group_by, .g = dim)
r_summarise <- record(summarise, .g = dim)

```

```{r}
avia_clean <- avia %>%
  r_select(1, contains("20")) %>% # select the first column and every column starting with 20
  bind_record(r_pivot_longer, -starts_with("unit"), names_to = "date", values_to = "passengers") %>%
  bind_record(r_separate,
              col = 1,
              into = c("unit", "tra_meas", "air_pr\\time"),
              sep = ",")

```

Let’s focus on monthly data:

```{r}
avia_monthly <- avia_clean %>%
  bind_record(r_filter,
              tra_meas == "PAS_BRD_ARR",
              !is.na(passengers),
              str_detect(date, "M")) %>%
  bind_record(r_mutate,
              date = paste0(date, "01"),
              date = ymd(date)) %>%
  bind_record(r_select,
              destination = "air_pr\\time", date, passengers)

```

`avia_monthly` is an object of class `chronicle`, but in essence, it is just a list, with its own
print method:

```{r}
avia_monthly
```

Now that the data is clean, we can read the log:

```{r}
read_log(avia_monthly)
```

This is especially useful if the object `avia_monthly` gets saved using `saveRDS()`. People that
then read this object, can read the log to know what happened and reproduce the steps if necessary.

Let's take a look at the final data set:

```{r}
avia_monthly %>%
  pick("value")
```

It is also possible to take a look at the underlying `.log_df` object that contains more details,
and see the output of the `.g` argument (which was defined in the beginning as the `dim()` function):

```{r}
check_g(avia_monthly)
```

After `select()` the data has 509 rows and 231 columns, after the call to `pivot_longer()`
117070 rows and 3 columns, `separate()` adds two columns, after `filter()` only 7632 rows
remain (`mutate()` does not change the dimensions) and then `select()` is used to remove 2 columns.