--- title: "Introduction to the vital package" author: Rob J Hyndman bibliography: refs.bib output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to the vital package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( eval = nzchar(Sys.getenv("VIGNETTES")), # Only compile locally collapse = TRUE, comment = "#>", fig.width = 8, fig.height = 5, out.width = "100%" ) # Okabe-Ito colours for discrete scales options( ggplot2.discrete.colour = c("#D55E00", "#0072B2", "#009E73", "#CC79A7", "#E69F00", "#56B4E9", "#F0E442"), ggplot2.discrete.fill = c("#D55E00", "#0072B2", "#009E73", "#CC79A7", "#E69F00", "#56B4E9", "#F0E442") ) ``` The goal of vital is to allow analysis of demographic data using tidy tools. It works with other [tidyverse packages](https://www.tidyverse.org) such as [dplyr](https://dplyr.tidyverse.org) and [ggplot2](https://ggplot2.tidyverse.org). It also works with the [tidyverts](https://tidyverts.org) packages, [tsibble](https://tsibble.tidyverts.org) and [fable](https://fable.tidyverts.org). ```{r packages, message = FALSE, eval = TRUE} library(vital) library(tsibble) library(dplyr) library(ggplot2) ``` ## vital objects The basic data object is a `vital`, which is a time-indexed tibble that contains vital statistics such as births, deaths, population counts, and mortality and fertility rates. We will use Norwegian data in the following examples. First, let's remove the "Total" Sex category and collapse the upper ages into a final age group of 100+. ```{r example, eval = TRUE} nor <- norway_mortality |> filter(Sex != "Total") |> collapse_ages(max_age = 100) nor ``` This example contains data from `r min(nor$Year)` to `r max(nor$Year)`. There are 101 age groups and 2 Sex categories. A vital must have a time "index" variable, and optionally other categorical variables known as "key" variables. Each row must have a unique combination of the index and key variables. Some columns are "vital" variables, such as "Age" and "Sex". We can use functions to see which variables are index, key or vital: ```{r vars} index_var(nor) key_vars(nor) vital_vars(nor) ``` ## Plots There are `autoplot()` functions for plotting `vital` objects. These produce rainbow plots [@rainbow] where each line represents data for one year, and the variable is plotted against age. ```{r autoplot, warning = FALSE, fig.cap="", fig.alt="Rainbow plot of mortality rates in Norway."} nor |> autoplot(Mortality) + scale_y_log10() ``` We can use standard ggplot functions to modify the plot as desired. For example, here are population pyramids for all years. ```{r pyramid, fig.cap="", fig.alt="Population pyramids for Norway."} nor |> mutate(Population = if_else(Sex == "Female", -Population, Population)) |> autoplot(Population) + coord_flip() + facet_grid(. ~ Sex, scales = "free_x") ``` ## Life tables and life expectancy Life tables [@chiang] can be produced using the `life_table()` function. It will produce life tables for each unique combination of the index and key variables other than age. ```{r lifetable} # Life tables for males and females in Norway in 2000 nor |> filter(Year == 2000) |> life_table() ``` Life expectancy ($e_x$ with $x=0$ by default) is computed using `life_expectancy()`: ```{r e0, fig.cap="", fig.alt="Life expectancy at birth in Norway."} # Life expectancy for males and females in Norway nor |> life_expectancy() |> ggplot(aes(x = Year, y = ex, color = Sex)) + geom_line() ``` ## Smoothing Several smoothing functions are provided: `smooth_spline()`, `smooth_mortality()`, `smooth_fertility()`, and `smooth_loess()`, each smoothing across the age variable for each year. The methods used in `smooth_mortality()` and `smooth_fertility()` are described in @hu. ```{r smoothing, fig.cap="", fig.alt="Smoothed mortality rates in Norway for 1967."} # Smoothed data nor |> filter(Year == 1967) |> smooth_mortality(Mortality) |> autoplot(Mortality) + geom_line(aes(y = .smooth), col = "#0072B2") + ylab("Mortality rate") + scale_y_log10() ``` ## Lee-Carter models Lee-Carter models [@lc] are estimated using the `LC()` function which must be called within a `model()` function: ```{r lc} # Lee-Carter model lc <- nor |> model(lee_carter = LC(log(Mortality))) lc ``` Models are fitted for all combinations of key variables excluding age. To see the details for a specific model, use the `report()` function. ```{r lc2} lc |> filter(Sex == "Female") |> report() ``` The results can be plotted. ```{r lc3, fig.cap="", fig.alt="Components from Lee-Carter model."} autoplot(lc) ``` The components can be extracted. ```{r lccomponents} age_components(lc) time_components(lc) ``` Forecasts are obtained using the `forecast()` function ```{r lc5} # Forecasts from Lee-Carter model lc |> forecast(h = 20) ``` The forecasts are returned as a distribution column (here transformed normal because of the log transformation used in the model). The `.mean` column gives the point forecasts equal to the mean of the distribution column. ## Functional data models Functional data models [@hu] can be estimated in a similar way to Lee-Carter models, but with an additional smoothing step, then modelling with `LC` replaced by `FDM`. ```{r fdm, fig.cap="", fig.alt="First few components from functional data model for mortality in Norway."} # FDM model fdm <- nor |> smooth_mortality(Mortality) |> model(hu = FDM(log(.smooth))) fc_fdm <- fdm |> forecast(h = 20) autoplot(fc_fdm) + scale_y_log10() ``` Functional data models have multiple principal components, rather than the single factor used in Lee-Carter models. ```{r fdmplot, fig.cap="", fig.alt="First three components from functional data model for mortality in Norway."} fdm |> autoplot(show_order = 3) ``` By default, six factors are estimated using `FDM()`. Here we have chosen to plot only the first three. The components can be extracted. ```{r fdmcomponents} age_components(fdm) time_components(fdm) ``` ## Coherent functional data models A coherent functional data model [@hby], is obtained by first computing the sex-products and sex-ratios of the smoothed mortality data. Then a functional data model is fitted to the smoothed data, forecasts are obtained, and the product/ratio transformation is reversed. The following code shows the steps. ```{r coherent} fdm_coherent <- nor |> smooth_mortality(Mortality) |> make_pr(.smooth) |> model(hby = FDM(log(.smooth), coherent = TRUE)) fc_coherent <- fdm_coherent |> forecast(h = 20) |> undo_pr(.smooth) fc_coherent ``` Here, `make_pr()` makes the product-ratios, while `undo_pr()` undoes them. The argument `coherent = TRUE` in `FDM()` ensures that the ARIMA models fitted to the coefficients are stationary when applied to the sex-ratios. ## References