| Title: | Helpful Functions for Cleaning Surveillance Data | 
| Version: | 2025.10.27 | 
| Description: | Helpful functions for the cleaning and manipulation of surveillance data, especially with regards to the creation and validation of panel data from individual level surveillance data. | 
| Depends: | R (≥ 3.5.0) | 
| License: | MIT + file LICENSE | 
| URL: | https://www.csids.no/cstidy/, https://github.com/csids/cstidy | 
| BugReports: | https://github.com/csids/cstidy/issues | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| Imports: | data.table, magrittr, ggplot2, csdata, cstime, crayon, digest, stringr, methods | 
| Suggests: | testthat, knitr, rmarkdown, rstudioapi, glue, gt, dplyr, purrr | 
| RoxygenNote: | 7.2.3 | 
| VignetteBuilder: | knitr | 
| NeedsCompilation: | no | 
| Packaged: | 2025-10-27 10:47:14 UTC; raw996 | 
| Author: | Richard Aubrey White | 
| Maintainer: | Richard Aubrey White <hello@rwhite.no> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-10-27 11:30:02 UTC | 
Pipe operator
Description
See magrittr::%>% for details.
Usage
lhs %>% rhs
Arguments
| lhs | A value or the magrittr placeholder. | 
| rhs | A function call using the magrittr semantics. | 
Value
The result of calling rhs(lhs).
Expand time to
Description
Attempts to expand the dataset to include more time
A time series is defined as a unique combination of:
- granularity_time 
- granularity_geo 
- country_iso3 
- location_code 
- border 
- age 
- sex 
- *_id 
- *_tag 
Usage
expand_time_to(
  x,
  max_isoyear = NULL,
  max_isoyearweek = NULL,
  max_date = NULL,
  ...
)
Arguments
| x | An object of type  | 
| max_isoyear | Maximum isoyear | 
| max_isoyearweek | Maximum isoyearweek | 
| max_date | Maximum date | 
| ... | Not used. | 
Value
csfmt_rts_data_v2, a larger dataset that includes more rows corresponding to more time.
See Also
Other csfmt_rts_data: 
identify_data_structure(),
remove_class_csfmt_rts_data(),
set_csfmt_rts_data_v1(),
set_csfmt_rts_data_v2(),
unique_time_series()
Generate test data
Description
Generates some test data
Usage
generate_test_data(fmt = "csfmt_rts_data_v2")
Arguments
| fmt | Data format ( | 
Value
csfmt_rts_data_v2, a dataset containing fake data.
Examples
cstidy::generate_test_data("csfmt_rts_data_v2")
Provides corresponding healed times (deprecated)
Description
Provides corresponding healed times (deprecated)
Usage
heal_time_csfmt_rts_data_v1(x, cols, granularity_time = "date")
Arguments
| x | A vector containing either dates, isoyearweek, or isoyear. | 
| cols | Columns to restrict the output to. | 
| granularity_time | date, isoyearweek, or isoyear, depending on the values contained in x. | 
Value
data.table, a dataset with time columns corresponding to the values given in x.
Provides corresponding healed times
Description
Provides corresponding healed times
Usage
heal_time_csfmt_rts_data_v2(x, cols, granularity_time = "date")
Arguments
| x | A vector containing either dates, isoyearweek, or isoyear. | 
| cols | Columns to restrict the output to. | 
| granularity_time | date, isoyearweek, or isoyear, depending on the values contained in x. | 
Value
data.table, a dataset with time columns corresponding to the values given in x.
Hash the data structure of a dataset for a given column
Description
Reduces the data structure of a column inside a dataset into something that describes
Usage
identify_data_structure(x, col, ...)
## S3 method for class 'csfmt_rts_data_v2'
identify_data_structure(x, col, ...)
## S3 method for class ''tbl_Microsoft SQL Server''
identify_data_structure(x, col, ...)
Arguments
| x | An object | 
| col | Column name to hash | 
| ... | Arguments passed to or from other methods | 
Value
csfmt_rts_data_structure_hash_v2, a summary object.
See Also
Other csfmt_rts_data: 
expand_time_to(),
remove_class_csfmt_rts_data(),
set_csfmt_rts_data_v1(),
set_csfmt_rts_data_v2(),
unique_time_series()
Examples
cstidy::generate_test_data() %>%
  cstidy::set_csfmt_rts_data_v2() %>%
  cstidy::identify_data_structure("deaths_n") %>%
  plot()
Covid-19 data for PCR-confirmed cases in Norway (nation and county)
Description
This data comes from the Norwegian Surveillance System for Communicable Diseases (MSIS). The date corresponds to when the PCR-test was taken.
Usage
nor_covid19_cases_by_time_location_csfmt_rts_v1
Format
A csfmt_rts_data_v1 with 11028 rows and 18 variables:
- granularity_time
- day/isoweek 
- granularity_geo
- nation, county 
- country_iso3
- nor 
- location_code
- norge, 11 counties 
- border
- 2020 
- age
- total 
- isoyear
- Isoyear of event 
- isoweek
- Isoweek of event 
- isoyearweek
- Isoyearweek of event 
- season
- Season of event 
- seasonweek
- Seasonweek of event 
- calyear
- Calyear of event 
- calmonth
- Calmonth of event 
- calyearmonth
- Calyearmonth of event 
- date
- Date of event 
- covid19_cases_testdate_n
- Number of confirmed covid19 cases 
- covid19_cases_testdate_pr100000
- Number of confirmed covid19 cases per 100.000 population 
Details
The raw number of cases and cases per 100.000 population are recorded.
This data was extracted on 2022-05-04.
Source
Norwegian Covid-19 data for ICU and hospitalization
Description
This data was extracted on 2022-05-04.
Usage
nor_covid19_icu_and_hospitalization_csfmt_rts_v1
Format
A csfmt_rts_data_v1 with 919 rows and 18 variables:
- granularity_time
- day/isoweek 
- granularity_geo
- nation 
- country_iso3
- nor 
- location_code
- norge 
- border
- 2020 
- age
- total 
- isoyear
- Isoyear of event 
- isoweek
- Isoweek of event 
- isoyearweek
- Isoyearweek of event 
- season
- Season of event 
- seasonweek
- Seasonweek of event 
- calyear
- Calyear of event 
- calmonth
- Calmonth of event 
- calyearmonth
- Calyearmonth of event 
- date
- Date of event 
- icu_with_positive_pcr_n
- Number of new admissions to the ICU with a positive PCR test 
- hospitalization_with_covid19_as_primary_cause_n
- Number of new hospitalizations with Covid-19 as the primary cause 
Source
Remove class csfmt_rts_data_*
Description
Remove class csfmt_rts_data_*
Usage
remove_class_csfmt_rts_data(x)
Arguments
| x | data.table | 
Value
No return value, called for the side effect of removing the csfmt_rts_data class from x.
See Also
Other csfmt_rts_data: 
expand_time_to(),
identify_data_structure(),
set_csfmt_rts_data_v1(),
set_csfmt_rts_data_v2(),
unique_time_series()
Examples
x <- cstidy::generate_test_data() %>%
  cstidy::set_csfmt_rts_data_v2()
class(x)
cstidy::remove_class_csfmt_rts_data(x)
class(x)
Convert data.table to csfmt_rts_data_v1 (deprecated)
Description
set_csfmt_rts_data_v1 converts a data.table to csfmt_rts_data_v1 by reference.
csfmt_rts_data_v1 creates a new csfmt_rts_data_v1 (not by reference) from either a data.table or data.frame.
Usage
set_csfmt_rts_data_v1(x, create_unified_columns = TRUE, heal = TRUE)
csfmt_rts_data_v1(x, create_unified_columns = TRUE, heal = TRUE)
Arguments
| x | The data.table to be converted to csfmt_rts_data_v1 | 
| create_unified_columns | Do you want it to create unified columns? | 
| heal | Do you want to impute missing values on creation? | 
Value
An extended data.table, which has been modified by reference and returned (invisibly).
No return value, called for side effect of replacing the current data.table with a csfmt_rts_data_v1 in place.
Returns a duplicated csfmt_rts_data_v1.
Smart assignment
csfmt_rts_data_v1 contains the smart assignment feature for time and geography.
When the variables in bold are assigned using :=, the listed variables will be automatically imputed.
location_code:
- granularity_geo 
- country_iso3 
isoyear:
- granularity_time 
- isoweek 
- isoyearweek 
- season 
- seasonweek 
- calyear 
- calmonth 
- calyearmonth 
- date 
isoyearweek:
- granularity_time 
- isoyear 
- isoweek 
- season 
- seasonweek 
- calyear 
- calmonth 
- calyearmonth 
- date 
date:
- granularity_time 
- isoyear 
- isoweek 
- isoyearweek 
- season 
- seasonweek 
- calyear 
- calmonth 
- calyearmonth 
Unified columns
csfmt_rts_data_v1 contains 16 unified columns:
- granularity_time 
- granularity_geo 
- country_iso3 
- location_code 
- border 
- age 
- sex 
- isoyear 
- isoweek 
- isoyearweek 
- season 
- seasonweek 
- calyear 
- calmonth 
- calyearmonth 
- date 
See Also
Other csfmt_rts_data: 
expand_time_to(),
identify_data_structure(),
remove_class_csfmt_rts_data(),
set_csfmt_rts_data_v2(),
unique_time_series()
Convert data.table to csfmt_rts_data_v2
Description
set_csfmt_rts_data_v2 converts a data.table to csfmt_rts_data_v2 by reference.
csfmt_rts_data_v2 creates a new csfmt_rts_data_v2 (not by reference) from either a data.table or data.frame.
Usage
set_csfmt_rts_data_v2(x, create_unified_columns = TRUE, heal = TRUE)
csfmt_rts_data_v2(x, create_unified_columns = TRUE, heal = TRUE)
Arguments
| x | The data.table to be converted to csfmt_rts_data_v2 | 
| create_unified_columns | Do you want it to create unified columns? | 
| heal | Do you want to impute missing values on creation? | 
Details
For more details see the vignette:
vignette("csfmt_rts_data_v2", package = "cstidy")
Value
An extended data.table, which has been modified by reference and returned (invisibly).
No return value, called for side effect of replacing the current data.table with a csfmt_rts_data_v2 in place.
Returns a duplicated csfmt_rts_data_v2.
Smart assignment
csfmt_rts_data_v2 contains the smart assignment feature for time and geography.
When the variables in bold are assigned using :=, the listed variables will be automatically imputed.
location_code:
- granularity_geo 
- country_iso3 
isoyear:
- granularity_time 
- isoweek 
- isoyearweek 
- isoquarter 
- isoyearquarter 
- season 
- seasonweek 
- calyear 
- calmonth 
- calyearmonth 
- date 
isoyearweek:
- granularity_time 
- isoyear 
- isoweek 
- isoquarter 
- isoyearquarter 
- season 
- seasonweek 
- calyear 
- calmonth 
- calyearmonth 
- date 
date:
- granularity_time 
- isoyear 
- isoweek 
- isoyearweek 
- isoquarter 
- isoyearquarter 
- season 
- seasonweek 
- calyear 
- calmonth 
- calyearmonth 
Unified columns
csfmt_rts_data_v2 contains 16 unified columns:
- granularity_time 
- granularity_geo 
- country_iso3 
- location_code 
- border 
- age 
- sex 
- isoyear 
- isoweek 
- isoyearweek 
- isoquarter 
- isoyearquarter 
- season 
- seasonweek 
- calyear 
- calmonth 
- calyearmonth 
- date 
See Also
Other csfmt_rts_data: 
expand_time_to(),
identify_data_structure(),
remove_class_csfmt_rts_data(),
set_csfmt_rts_data_v1(),
unique_time_series()
Examples
# Create some fake data as data.table
d <- cstidy::generate_test_data(fmt = "csfmt_rts_data_v2")
d <- d[1:5]
# convert to csfmt_rts_data_v2 by reference
cstidy::set_csfmt_rts_data_v2(d, create_unified_columns = TRUE)
#
d[1, isoyearweek := "2021-01"]
d
d[2, isoyear := 2019]
d
d[3, date := as.Date("2020-01-01")]
d
d[4, c("isoyear", "isoyearweek") := .(2021, "2021-01")]
d
d[5, c("location_code") := .("norge")]
d
# Investigating the data structure of one column inside a dataset
cstidy::generate_test_data() %>%
  cstidy::set_csfmt_rts_data_v2() %>%
  cstidy::identify_data_structure("deaths_n") %>%
  plot()
# Investigating the data structure via summary
cstidy::generate_test_data() %>%
  cstidy::set_csfmt_rts_data_v2() %>%
  summary()
Unique time series
Description
Attempts to identify the unique time series that exist in this dataset.
A time series is defined as a unique combination of:
- granularity_time 
- granularity_geo 
- country_iso3 
- location_code 
- border 
- age 
- sex 
- *_id 
- *_tag 
Usage
unique_time_series(x, set_time_series_id = FALSE, ...)
Arguments
| x | An object of type  | 
| set_time_series_id | If TRUE, then  | 
| ... | Not used. | 
Value
data.table, a dataset that lists all the unique time series in x.
See Also
Other csfmt_rts_data: 
expand_time_to(),
identify_data_structure(),
remove_class_csfmt_rts_data(),
set_csfmt_rts_data_v1(),
set_csfmt_rts_data_v2()