--- title: "ARDECO R package" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{ARDECO R package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ``` r library(ARDECO) ``` ## Content {#content} ### [What is ARDECO](#whatis) ### [Variables and indicators](#varandind) ### [How to use ARDECO R package](#howto) ### [Filtering data](#filter) ## What is ARDECO - [back](#content) {#whatis} The Annual Regional Database of the European Commission (ARDECO) is maintained by the Joint Research Centre in close coordination with the Directorate General for Regional and Urban Policy. Its purpose is to provide consistent and harmonised time-series data on demographic and socio-economic variables at the regional and sub-regional levels. ARDECO primarily relies on official data from sources like Eurostat’s ‘Regional Accounts’ ( https://ec.europa.eu/eurostat/web/national-accounts/methodology/european-accounts/regional-accounts ) and national or regional statistical offices. However, it also incorporates data from supplementary sources (such as the European Regional Database from Cambridge Econometrics - httpS://www.camecon.com/wp-content/uploads/2019/01/ERD-manual.pdf) and estimates generated using various methodologies (such as data interpolation, regional shares of closest year, and proxy variables). In cases where official data exhibits inconsistencies across the time-series due to breaks or provisional values, ARDECO replaces these values with estimates based on the most suitable methodology. Ensuring consistency across variables and over time, therefore, sometimes requires adjustments of official values For most variables and countries, ARDECO extends its time-series back to 1980 (1960 for population data). Additionally, short-term projections based on AMECO (https://economy-finance.ec.europa.eu/economic-research-and-databases/economic-databases/ameco-database_en) forecasts are included whenever possible. More detailed information about ARDECO can be found in the methodological note: **European Commission, Joint Research Centre, Auteri, D., Attardo, C., Berzi, M., Dorati, C., Albinola, F., Baggio, L., Bucciarelli, G., Bussolari, I. and Dijkstra, L., [The Annual Regional Database of the European Commission (ARDECO) - Methodological Note](https://publications.jrc.ec.europa.eu/repository/handle/JRC138212), European Commission, Ispra, 2024, JRC138212)** ## Variables and indicators - [back](#content) {#varandind} * [Coding convention](#codconv) * [The thematic content](#themcont) * [The geographical coverage (NUTS)](#geocov) * [Territorial tipologies: aggregated data by tercet](#tercetdef) * [The temporal coverage](#tempcov) * [Variable as a set of datasets - variable dimensions](#vardim) ### Coding convention - [back](#varandind) {#codconv} Within ARDECO, both variables and indicators are provided. Variables represent core data expressed as ‘volumes’ and are computed from primary sources. Indicators, on the other hand, are expressed as ‘ratios’ and result from dividing one variable (e.g., GDP) by another (e.g., average population). For a convention adopted in ARDECO, variables have a code composed by 4 o 5 characters (for example: "GDP at current market prices" is coded SUVGD), while the indicators are coded with 6 or more characters (for example: "GDP per capita at current prices" is coded SUVGDP) ### The thematic content - [back](#varandind) {#themcont} Currenlty ARDECO provides a set of variables and indicators related to the following domains: * Population and Vital Statistics * Domestic Product * EMployment * Labour Cost and Productivity * Education * Households * Capital Formation and capital stock * GHG Emissions The complete set of available variables and indicators is provided by the function ardeco_get_variable_list(): ``` r print(ardeco_get_variable_list(), n=1000) #> # A tibble: 68 × 2 #> code description #> #> 1 RNLHTP "Hours worked per capita" #> 2 RNLHTE "Hours worked per employed person" #> 3 SNPCN "Total population change" #> 4 PVGD "GDP price index (implicit deflator, national, 2015=100, euro)" #> 5 PVGE "GVA price index (implicit deflator, national, 2015=100, euro)" #> 6 SNMTN "Net migration" #> 7 SNETD "Total Employment (workplace based, employed persons)" #> 8 SNWTD "Wage and salary earners (workplace based, employees)" #> 9 SNETZ "Employment by industry (10 NACE sectors)" #> 10 RNECN "Residence-based employment" #> 11 RNUTN "Unemployment (LFS)" #> 12 RNLHT "Hours Worked (employed persons)" #> 13 RUWCDW "Nominal compensation per employee" #> 14 RUWCZ "Compensation of employees at current prices by industry (10 NACE sectors)" #> 15 ROWCD "Compensation of Employees at constant prices" #> 16 ROWCDH "Real compensation per hour worked" #> 17 ROWCDW "Real compensation per employee" #> 18 RUWCDHH "Nominal unit labour cost based on hours worked" #> 19 SNPBN "Live births" #> 20 SNPDN "Deaths" #> 21 SNPNN "Natural change of population" #> 22 SUVGDP "GDP per capita at current prices" #> 23 RUWCDWE " Nominal unit labour cost based on persons" #> 24 RUIGT "Gross Fixed Capital Formation at current prices" #> 25 RUIGZ "Gross Fixed Capital Formation by industry (10 NACE sectors) at current prices" #> 26 ROIGZ "Gross Fixed Capital Formation by industry (10 NACE sectors) at constant prices" #> 27 ROKND "Capital Stock at constant prices" #> 28 SUVGDH "Nominal labour productivity per hour worked" #> 29 SNPTN "Population on 1st January by broad age group and sex" #> 30 SNPTD "Average annual population" #> 31 SUVGDE "Nominal labour productivity per person employed" #> 32 SUVGD "GDP at current market prices" #> 33 SOVGDH " Real labour productivity per hour worked" #> 34 SOVGDE "Real labour productivity per person employed" #> 35 ROWCZ "Compensation of employees at constant prices by industry (10 NACE sectors)" #> 36 RPDTN "Population by educational attainment level" #> 37 RPDEN "Early leavers from education and training (18-24 years)" #> 38 RPDNN "Young people neither in employment nor in education and training (NEET, 15-29 years… #> 39 SUVGE "GVA at basic prices" #> 40 SUVGZ "GVA at basic prices by industry (10 NACE sectors)" #> 41 SOVGD "GDP at constant prices" #> 42 SOVGDP "GDP per capita at constant prices" #> 43 SOVGE "GVA at constant prices" #> 44 SOVGZ "GVA at constant prices by industry (10 NACE sectors)" #> 45 SPVGD "GDP price index (chain-type index, 2015=100)" #> 46 SPVGE "GVA price index (chain-type index, 2015=100)" #> 47 PVGZ "GVA price indices by industry (implicit deflators, national, 2015=100, euro)" #> 48 PPP "Purchasing power parities (PPPs) - EU=100" #> 49 RNLHW "Hours Worked (employees)" #> 50 RNLHZ "Hours Worked by industry (10 NACE sectors)" #> 51 RNLCN "Civilian Labour Force" #> 52 RNLTN "Total Labour Force" #> 53 SNETDP "Employment per capita" #> 54 RUWCD "Compensation of employees at current prices" #> 55 RUWCDH "Nominal compensation per hour worked" #> 56 RUVNH "Households net disposable income" #> 57 RUYNH "Net property income" #> 58 RUONH "Net operating Surplus and Mixed Income" #> 59 RUTYH "Current taxes on income and wealth" #> 60 PIGT "GFCF price index (implicit deflator, national, 2015=100, euro)" #> 61 SUKCT "Consumption of fixed capital at current prices" #> 62 SUKCZ "Consumption of fixed capital by industry (10 NACE sectors) at current prices" #> 63 ROIGT "Gross Fixed Capital Formation at constant prices" #> 64 SOKCT "Consumption of fixed capital at constant prices" #> 65 SOKCZ "Consumption of fixed capital by industry (10 NACE sectors) at constant prices" #> 66 PIGZ "GFCF price indices by industry (implicit deflators, national, 2015=100, euro)" #> 67 EDGAR "Emissions Database for Global Atmospheric Research" #> 68 EDGARP "GHG emissions per capita" ``` ### The geographical coverage (NUTS) [back](#varandind) {#geocov} Variables and indicators provide data for each EU country and its regions, according to the territorial classification provided by the Nomenclature of Territorial Units for Statistics as defined by Eurostat (NUTS nomenclature: https://ec.europa.eu/eurostat/web/nuts/overview). The NUTS classification is composed of territorial units organised in four hierarchical and nested levels, i.e. where the upper level corresponds to aggregates of the lower level – from NUTS0 (corresponding to national level) to NUTS3 (corresponding to sub-regional level). Each three years a revision of the NUTS codes is performed, replacing some NUTS codes with new ones due to updating of statistical reference territory. Each review generates a new __NUTS version__ NUTS coding follows these rules: * Nuts code at level 0 (country code) is composed by 2 letter (the country code according to ISO-3133-1 apart Greece which has been coded with EL in place of GR) * Each nuts code at following levels (1, 2 and 3) add an additional char to the code. For example IT1A is a region belonging to great region IT1 in Italy (IT) * Each NUTS code represent a specific territorial unit in any NUTS version. When a reference territorial unit change, also its NUTS code change and it's unique in all NUTS version. More details about NUTS codeing are available in https://ec.europa.eu/eurostat/web/nuts/overview #### Territorial tipologies: aggregated data by tercet [back](#geocov) {#tercetdef} According to [EUROSTAT definition](https://ec.europa.eu/eurostat/web/nuts/territorial-typologies), territorial typologies (__tercet__) are classification systems that categorize regions and areas within the European Union based on specific geographical, socio-economic, and administrative criteria. These typologies are used to facilitate regional analysis and policy-making by providing a standardized framework for comparing different territories. Key territorial typologies include classifications such as urban-rural typologies, coastal and non-coastal regions, mountain areas, and metropolitan regions. By using these typologies, Eurostat aims to enhance the understanding of regional disparities and development patterns across the EU, supporting more effective regional policy interventions. Currently, in ARDECO, for the variables which have data at NUTS3 level, it's possible to require aggregated data at NUTS0 for Urban-rural typologies (URT). URT classifies each NUTS3 into one of three URT classes: * Predominantly urban region * Intermediate regions * Predominantly rural regions A more detailed URT classification is also available: Urban-rural typologies with remoteness (URT with remoteness). URT with remoteness classifies each NUTS3 in 5 classes: * Predominantly urban region * Intermediate regions, close to a city * Intermediate, remote regions * Predominantly rural regions, close to a city * Predominantly rural, remote regions The computation of the tercet classes for each NUTS0 is done by the sum of its NUTS3 belonging in each class. ### The temporal coverage [back](#varandind) {#tempcov} For each variable and for each territorial unit ARDECO provides the yearly value. In general, for each variable values from 2000 to the current year are provided. Depending by the variable, data before 2000 are provided according to the available information recovered from different sources of data. For example, Total population data are provided from 1960, for many variables data start from 1980, for some specific and more detailed data (economic sectorial data, age class for population) data starts from 1990 or 1995. In addition, provisional data are provided for 2/3 years from the current year according to the forecasted yearly data provided by AMECO (Annual Macro-economic database of the European COmmission's Directorate General for Economic and Financial Affairs, https://economy-finance.ec.europa.eu/economic-research-and-databases/economic-databases/ameco-database_en). ### Variable as a set of datasets - variable dimensions [back](#varandind) {#vardim} In ARDECO, each variable have at least two dimension: 'Unit' and 'Version'. 'Unit' is the unit of measure related to the value associated to a nutscode in a specific year. _Example_: the unit for variable 'Total Population' is 'Persons'. 'Version' refers to the NUTS version for which the values of a variables are provided. Some variables can have more unit of measure. For example, GDP is provided in 'EUR' and in 'PPS'. A variable can also have additional dimensions. _Example_: In 'Total Population' it has been defined two additional dimension: 'sex' and 'age group'. Each of this dimensions have a set of possible values (domain). The domain for dimension 'sex' is: 'Total', 'Male', 'Female'. The domain for dimension 'age group' is: 'Total', 'Less than 15', 'from 15 to 64', '65 and over'. The variable 'Total Population' can collect values for each nutcode, for each year and for each combination of values of its dimensions. Each combination of values of the dimensions of a variable identifies a dataset for that variable. _Example_: the variable 'Total Population' as defined above, with dimension 'Unit', 'sex' and 'age group' have the following datasets: unit='Persons' version= 2021 sex='Total' 'age group'='Total' represent the total population at NUTS version 2021 unit='Persons' version= 2021 sex='Male' 'age group'='Total' represent the total male population at NUTS version 2021 unit='Persons' version= 2021 sex='Female' 'age group'='Total' represent the total female population at NUTS version 2021 unit='Persons' version= 2021 sex='Total' 'age group'='Less than 15' represent the total population have less than 15 years at NUTS version 2021 ...an so on... To retrieve the number of datasets defined for a variable, use the function ardeco_get_dataset_list(): ``` r print(ardeco_get_dataset_list('SNPTN'), n=100) #> # A tibble: 12 × 5 #> var sex age unit vers #> #> 1 SNPTN Females 65 years and over Persons 2016, 2021, 2024 #> 2 SNPTN Males 65 years and over Persons 2016, 2021, 2024 #> 3 SNPTN Total 65 years and over Persons 2016, 2021, 2024 #> 4 SNPTN Females From 15 to 64 years Persons 2016, 2021, 2024 #> 5 SNPTN Males From 15 to 64 years Persons 2016, 2021, 2024 #> 6 SNPTN Total From 15 to 64 years Persons 2016, 2021, 2024 #> 7 SNPTN Females Less than 15 years Persons 2016, 2021, 2024 #> 8 SNPTN Males Less than 15 years Persons 2016, 2021, 2024 #> 9 SNPTN Total Less than 15 years Persons 2016, 2021, 2024 #> 10 SNPTN Females Total Persons 2016, 2021, 2024 #> 11 SNPTN Males Total Persons 2016, 2021, 2024 #> 12 SNPTN Total Total Persons 2016, 2021, 2024 ``` ## How to use ARDECO R package - [back](#content) {#howto} * [Discover the available variable: ***ardeco_get_variable_list***](#varlist) * [List of Variable datasets: ***ardeco_get_dataset_list***](#datasetlist) * [List of tercets: ***ardeco_get_tercet_list***](#tercetlist) * [How to retrieve variable data: ***ardeco_get_dataset_data***](#vardatawkf) ARDECO R package provide a set of functions permitting to retrive the data of the ARDECO variables into a R dataframe. The data retrival is performed through the exploitation of the ARDECO APIs. These functions are an interface to the ARDECO APIs to simplify R users in data retrieval. The core function is ***ardeco_get_dataset_data*** which call ARDECO APIs to retrive data. This functions need the variable code and, optionally, a set of parameters enabling the user to filter data at the request level, minimizing the time for the In order to retrieve data for a variable, if the user doesn't know the structure of the desired variable, the package provides a set fo functions in order to: * Discover the available variables in ARDECO database * Retrieve the list of datasets and dimensions defined for a specific variable * Understand if its possible also to retrive tercet data for a specific variable * build the correct request to retrieve the needed data ### Discover the available variable: ***ardeco_get_variable_list*** - [back](#howto) {#varlist} The unique mandatory information to retrieve data is the variable code. To retrieve the available ARDECO variables code, use > **ardeco_get_variable_list()** The function returns a tibble with the following columns: * ***code*** is the mandatory code requested by the other ARDECO R package functions (_ardeco_get_dataset_list_, _ardeco_get_dataset_data_, _ardeco_get_tercet_list_) to retrieve information and data. * ***description*** explains the variable represented by the variable code Example: ``` r mytb <- ardeco_get_variable_list() print(mytb, max=50) #> # A tibble: 68 × 2 #> code description #> #> 1 RNLHTP Hours worked per capita #> 2 RNLHTE Hours worked per employed person #> 3 SNPCN Total population change #> 4 PVGD GDP price index (implicit deflator, national, 2015=100, euro) #> 5 PVGE GVA price index (implicit deflator, national, 2015=100, euro) #> 6 SNMTN Net migration #> 7 SNETD Total Employment (workplace based, employed persons) #> 8 SNWTD Wage and salary earners (workplace based, employees) #> 9 SNETZ Employment by industry (10 NACE sectors) #> 10 RNECN Residence-based employment #> # ℹ 58 more rows ``` ### List of Variable datasets: ***ardeco_get_dataset_list*** - [back](#howto) {#datasetlist} As explained in ['Variable as a set of datasets'](#vardim) each variable have one or more datasets depending by its dimensions. To retrieve the list of the datasets defined for a variable, use > **ardeco_get_dataset_list('_varcode_')** * ***varcode*** is one of the ARDECO variable code The function returns a tibble with the following columns: * ***var***: variable code * ***unit***: available unit of measure for the variable * ***vers***: available NUTS version for the variable dataset (see ['The geographical coverage (NUTS)'](#geocov)) * ***additional dimensions***: additional column for each additional dimension (if any) and related values (see ['Variable as a set of datasets'](#vardim)) Example: ``` r mytb <- ardeco_get_dataset_list('SNETZ') print(mytb) #> # A tibble: 13 × 4 #> var unit sector vers #> #> 1 SNETZ Thousands Persons A 2016, 2021, 2024 #> 2 SNETZ Thousands Persons B-E 2016, 2021, 2024 #> 3 SNETZ Thousands Persons F 2016, 2021, 2024 #> 4 SNETZ Thousands Persons G-I 2016, 2021, 2024 #> 5 SNETZ Thousands Persons G-J 2016, 2021, 2024 #> 6 SNETZ Thousands Persons J 2016, 2021, 2024 #> 7 SNETZ Thousands Persons K 2016, 2021, 2024 #> 8 SNETZ Thousands Persons K-N 2016, 2021, 2024 #> 9 SNETZ Thousands Persons L 2016, 2021, 2024 #> 10 SNETZ Thousands Persons M-N 2016, 2021, 2024 #> 11 SNETZ Thousands Persons O-Q 2016, 2021, 2024 #> 12 SNETZ Thousands Persons O-U 2016, 2021, 2024 #> 13 SNETZ Thousands Persons R-U 2016, 2021, 2024 ``` ### List of tercets: ***ardeco_get_tercet_list*** - [back](#howto) {#tercetlist} ARDECO R package provides the possibility to require data for a variable at available tercet (see ['Territorial typologies: aggregated data by tercet'](#tercetdef)). To check which tercet are currently available in ARDECO database, use: > **ardeco_get_tercet_list()** This function return the available tercet and related tercet class. Each tercet and related tercet classes) are identified by codes: tercet_code and tercet_class_code. The function returns a tibble with the following columns: * ***tercet_code***: the tercet code needed to request data at tercet level * ***tercet_name***: the name of the tercet represented by the tercet_code * ***tercet_class_code***: the code of the tercet class needed to request data for a tercet class * ***tercet_class_name***: the name of the tercet class represented by the tercet_class_code Example: ``` r mytb <- ardeco_get_tercet_list() print(mytb) #> tercet_code tercet_name tercet_class_code #> 1 1 by Urban-Rural Typology 0 #> 2 1 by Urban-Rural Typology 2 #> 3 1 by Urban-Rural Typology 1 #> 4 2 by Urban-Rural Typology with Remoteness 0 #> 5 2 by Urban-Rural Typology with Remoteness 3 #> 6 2 by Urban-Rural Typology with Remoteness 4 #> 7 2 by Urban-Rural Typology with Remoteness 5 #> 8 2 by Urban-Rural Typology with Remoteness 6 #> tercet_class_name #> 1 Predominantly urban #> 2 Predominantly rural #> 3 Intermediate #> 4 Predominantly urban #> 5 Intermediate, close to a city #> 6 Intermediate, remote #> 7 Predominantly rural, close to a city #> 8 Predominantly rural, remote ``` It's no possible require tercet data for all variable but for only those variables for which exist data at NUTS3 level. To check if it's possible to require tercet data for a variable, use > **ardeco_get_tercet_list('_varcode_')** * ***varcode*** is the variable code to check. If the variable haven't the NUTS3 level, the function return a warning highlighting the impossibility to aggregate data at tercet classes. Example: ``` r mytb <- ardeco_get_tercet_list('RPDTN') #> [1] "The variable RPDTN has no data at level 3 and it's no possible to aggregate data at tercet classes" ``` ### How to retrieve variable data: ***ardeco_get_dataset_data*** - [back](#howto) {#vardatawkf} The core function to retrieve data is > **ardeco_get_dataset_data('_varcode_')** * ***varcode*** is the code of one of the available variables exposed in ARDECO database. This function return a dataframe where any row expose the value for a specific nuts, year, nutsversion, unit, dimensions. The dataframe include the following columns: * ***VARIABLE***: the code of the requested variable * ***VERSIONS***: the nuts version of the nuts code * ***LEVEL***: the territorial level of the nuts code * ***NUTSCODE***: the nuts code of the value * ***YEAR***: the year of the value * ***UNIT***: the unit of measure of the value * ***VALUE***: the value of the variable Following an example to retrieve the complete set of 'Average annual population' data: ``` r mydf <- ardeco_get_dataset_data('SNPTD') print(mydf, max=50) #> VARIABLE VERSIONS LEVEL NUTSCODE YEAR UNIT VALUE #> 1 SNPTD 2024 0 AL 1960 Persons 1608801 #> 2 SNPTD 2024 0 AT 1960 Persons 7047440 #> 3 SNPTD 2024 0 BE 1960 Persons 9153499 #> 4 SNPTD 2024 0 BG 1960 Persons 7867374 #> 5 SNPTD 2024 0 CH 1960 Persons 5328000 #> 6 SNPTD 2024 0 CY 1960 Persons 441788 #> 7 SNPTD 2024 0 CZ 1960 Persons 9602011 #> [ reached 'max' / getOption("max.print") -- omitted 1249603 rows ] ``` Additional columns will be added if for the variable it has been defined additional dimensions. In order to avoid problems due to the big amount of data which can be requested by user, ***ardeco_get_dataset_data*** works sending a set of requests, one for any dataset defined into the variable, joining the results of every call into the resulting dataframe. In this way each call ask a relative small set of data, avoiding timeout constrain imposed by the network infrastructure. If a variable have a huge number of datasets, it's possible that the retrieval process of the entire variable needs long time (minutes) before receiving the final result. To check the status and the progress of the data retrieval, it's possible to use an optional parameter which display the progress of the data retrieval process. This parameter is ***verbose*** which by default is set to ***FALSE*** > ardeco_get_dataset_data('_varcode_', verbose=TRUE) * ***varcode*** is the code of one of the available variables exposed in ARDECO database. * ***verbose*** is a flag and it can be TRUE or FALSE (FALSE by default). If TRUE, print the current dataset requested by the function. Example: ``` r mydf <- ardeco_get_dataset_data('SNETZ', verbose=TRUE) #> [1] "Fetching... [unit]=Thousands Persons [sector]=A" #> [1] "Fetching... [unit]=Thousands Persons [sector]=B-E" #> [1] "Fetching... [unit]=Thousands Persons [sector]=F" #> [1] "Fetching... [unit]=Thousands Persons [sector]=G-I" #> [1] "Fetching... [unit]=Thousands Persons [sector]=G-J" #> [1] "Fetching... [unit]=Thousands Persons [sector]=J" #> [1] "Fetching... [unit]=Thousands Persons [sector]=K" #> [1] "Fetching... [unit]=Thousands Persons [sector]=K-N" #> [1] "Fetching... [unit]=Thousands Persons [sector]=L" #> [1] "Fetching... [unit]=Thousands Persons [sector]=M-N" #> [1] "Fetching... [unit]=Thousands Persons [sector]=O-Q" #> [1] "Fetching... [unit]=Thousands Persons [sector]=O-U" #> [1] "Fetching... [unit]=Thousands Persons [sector]=R-U" ``` ## Filtering data - [back](#content) {#filter} The function **ardeco_get_dataset_data** permits to send a request for a subset of data filtered according to the following parameters: * [Nuts version](#nutsver) * [Nutscode and nutslevel](#varnuts) * [Unit](#varunit) * [Year](#varyear) * [Dimensions](#dimens) * [Tercets](#tercet) The filter is composed by a set of additional parameters to add to the call of the function *ardeco_get_dataset_data* according to the following syntax: > **ardeco_get_dataset_data('_varcode_', parameter-1 = _par-value-1_, ..., parameter-n = _par-value-n_)** Where * ***parameter-x***: one of the above listed parameters * ***par-value-x***: the value of the parameter Example: ``` r # Request data for Italy at level 1, at nuts version=2021, # with unit of measure = 'Million EUR' and related to year 2015 mydf <- ardeco_get_dataset_data('SUVGD', version=2021, unit='Million PPS', year=2015, nutscode='IT', level=1) print(mydf, max=50) #> VARIABLE VERSIONS LEVEL NUTSCODE YEAR UNIT VALUE #> 1 SUVGD 2021 1 ITC 2015 Million PPS 529145.62 #> 2 SUVGD 2021 1 ITF 2015 Million PPS 252896.52 #> 3 SUVGD 2021 1 ITG 2015 Million PPS 117255.89 #> 4 SUVGD 2021 1 ITH 2015 Million PPS 369350.57 #> 5 SUVGD 2021 1 ITI 2015 Million PPS 349089.92 #> 6 SUVGD 2021 1 ITZ 2015 Million PPS 1252.18 ``` ### Nuts version - [back](#filter) {#nutsver} To request data at a specific nuts version, use: > **ardeco_get_dataset_data('_varcode_', version=_nutsversion_)** where * ***varcode*** is the code of one of the available variables exposed in ARDECO database * ***nutsversion*** is one of the versions available for the requested variable To check the available versions for the requested variable, use: > **ardeco_get_dataset_list('_varcode_')** (see ['List of the available datasets'](#datasetlist)) Example: ``` r # Check the available versions for variable SNPTD ardeco_get_dataset_list('SNPTD') #> # A tibble: 1 × 3 #> var unit vers #> #> 1 SNPTD Persons 2016, 2021, 2024 ``` ``` r mydf <- ardeco_get_dataset_data('SNPTD', version=2021) print(mydf, max=50) #> VARIABLE VERSIONS LEVEL NUTSCODE YEAR UNIT VALUE #> 1 SNPTD 2021 0 AL 1960 Persons 1608801 #> 2 SNPTD 2021 0 AT 1960 Persons 7047440 #> 3 SNPTD 2021 0 BE 1960 Persons 9153499 #> 4 SNPTD 2021 0 BG 1960 Persons 7867374 #> 5 SNPTD 2021 0 CH 1960 Persons 5328000 #> 6 SNPTD 2021 0 CY 1960 Persons 441788 #> 7 SNPTD 2021 0 CZ 1960 Persons 9602011 #> [ reached 'max' / getOption("max.print") -- omitted 135276 rows ] ``` ### Nutscode and nutslevel - [back](#filter) {#varnuts} Filtering by nuts-code returns data for all territories coded with nutscode starting with the value passed to the function. The filter syntax is the following: > **ardeco_get_dataset_data('_varcode_', nutscode=_val-nutscode_)** where * ***varcode*** is the code of one of the available variables exposed in ARDECO database * ***val-nutscode*** is a string representing the initial characters of the nutscode returned by the filter. The effect of this filter is: return all nutscodes starting with 'val-nutscode'. Example: ``` r # Retrive data with nutscode starting with 'IT' mydf <- ardeco_get_dataset_data('SUVGD', nutscode='IT') print(mydf, max=50) #> VARIABLE VERSIONS LEVEL NUTSCODE YEAR UNIT VALUE #> 1 SUVGD 2024 0 IT 1980 Million EUR 342242 #> 2 SUVGD 2024 0 IT 1981 Million EUR 386207 #> 3 SUVGD 2024 0 IT 1982 Million EUR 435132 #> 4 SUVGD 2024 0 IT 1983 Million EUR 497304 #> 5 SUVGD 2024 0 IT 1984 Million EUR 555942 #> 6 SUVGD 2024 0 IT 1985 Million EUR 595650 #> 7 SUVGD 2024 0 IT 1986 Million EUR 653497 #> [ reached 'max' / getOption("max.print") -- omitted 150097 rows ] ``` This filter can be refined using the parameter 'level'. The parameter level filter the data by nuts level. The filter syntax is the following: > **ardeco_get_dataset_data('_varcode_', level=_val-nutslevel_)** where * ***varcode*** is the code of one of the available variables exposed in ARDECO database * ***val-nutslevel*** is the nuts level of the nutscodes returned by the filter. This parameter can be: * a single number representing the requested nuts level (example: **level=0**): _Note: the val-nutslevel can be with or without the quote **'** (example: **level=0** or **level='0'**)_ * a quoted string with a series of nuts level separated by comma. In this case, it will be returned the values for each nutscodes belonging in the listed nuts levels (example: **level='0,2'** return values for nuts codes in nuts level 0 and 2). _Note: in this case, the quotes are mandatory_. * a quoted string with a couple of values divided by '-'. In this case, the two numbers represent the nuts level between the two values. Example: **level='0-2'** return values for nuts codes in nuts levels 0, 1 and 2). _Note: in this case, the quotes are mandatory_. Example: ``` r # Retrive SUVGD data for Italy at country level (level 0) for the year 2015 mydf <- ardeco_get_dataset_data('SUVGD', level=0, nutscode='IT', year=2015, version=2021) print(mydf, max=50) #> VARIABLE VERSIONS LEVEL NUTSCODE YEAR UNIT VALUE #> 1 SUVGD 2021 0 IT 2015 Million EUR 1663278 #> 2 SUVGD 2021 0 IT 2015 Million PPS 1618991 ``` ``` r # Retrive SUVGD data for Italy at country and regional level (level 0 and 2) for the year 2015 mydf <- ardeco_get_dataset_data('SUVGD', level='0,2', nutscode='IT', year=2015, version=2021) print(mydf, max=50) #> VARIABLE VERSIONS LEVEL NUTSCODE YEAR UNIT VALUE #> 1 SUVGD 2021 0 IT 2015 Million EUR 1663277.70 #> 2 SUVGD 2021 2 ITC1 2015 Million EUR 127475.53 #> 3 SUVGD 2021 2 ITC2 2015 Million EUR 4675.69 #> 4 SUVGD 2021 2 ITC3 2015 Million EUR 47405.32 #> 5 SUVGD 2021 2 ITC4 2015 Million EUR 364063.70 #> 6 SUVGD 2021 2 ITF1 2015 Million EUR 32016.27 #> 7 SUVGD 2021 2 ITF2 2015 Million EUR 6157.72 #> [ reached 'max' / getOption("max.print") -- omitted 39 rows ] ``` ``` r # Retrive SUVGD data for Italy at level from 0 to 2 for the year 2015 mydf <- ardeco_get_dataset_data('SUVGD', level='0-2', nutscode='IT', year=2015, version=2021) print(mydf, max=100) #> VARIABLE VERSIONS LEVEL NUTSCODE YEAR UNIT VALUE #> 1 SUVGD 2021 0 IT 2015 Million EUR 1663277.70 #> 2 SUVGD 2021 1 ITC 2015 Million EUR 543620.24 #> 3 SUVGD 2021 1 ITF 2015 Million EUR 259814.43 #> 4 SUVGD 2021 1 ITG 2015 Million EUR 120463.39 #> 5 SUVGD 2021 1 ITH 2015 Million EUR 379454.04 #> 6 SUVGD 2021 1 ITI 2015 Million EUR 358639.17 #> 7 SUVGD 2021 1 ITZ 2015 Million EUR 1286.43 #> 8 SUVGD 2021 2 ITC1 2015 Million EUR 127475.53 #> 9 SUVGD 2021 2 ITC2 2015 Million EUR 4675.69 #> 10 SUVGD 2021 2 ITC3 2015 Million EUR 47405.32 #> 11 SUVGD 2021 2 ITC4 2015 Million EUR 364063.70 #> 12 SUVGD 2021 2 ITF1 2015 Million EUR 32016.27 #> 13 SUVGD 2021 2 ITF2 2015 Million EUR 6157.72 #> 14 SUVGD 2021 2 ITF3 2015 Million EUR 104260.39 #> [ reached 'max' / getOption("max.print") -- omitted 44 rows ] ``` ### Unit - [back](#filter) {#varunit} To request data for a specific unit of measure, use: > **ardeco_get_dataset_data('_varcode_', unit='_unit_')** where * ***varcode*** is the code of one of the available variables exposed in ARDECO database * ***unit*** is one of the unit of measures available for the requested variable To check the available unit of measures for the requested variable, use: > **ardeco_get_dataset_list('_varcode_')** (see ['List of the available datasets'](#datasetlist)) Example: ``` r # Check the available units for variable SUVGD ardeco_get_dataset_list('SUVGD') #> # A tibble: 2 × 3 #> var unit vers #> #> 1 SUVGD Million EUR 2016, 2021, 2024 #> 2 SUVGD Million PPS 2016, 2021, 2024 ``` ``` r # Retrive data only for unit = 'Million PPS' mydf <- ardeco_get_dataset_data('SUVGD', unit='Million PPS') print(mydf, max=50) #> VARIABLE VERSIONS LEVEL NUTSCODE YEAR UNIT VALUE #> 1 SUVGD 2024 0 AT 1980 Million PPS 63886.92 #> 2 SUVGD 2024 0 BE 1980 Million PPS 81472.09 #> 3 SUVGD 2024 0 DE 1980 Million PPS NA #> 4 SUVGD 2024 0 DK 1980 Million PPS 40713.81 #> 5 SUVGD 2024 0 EL 1980 Million PPS 67653.78 #> 6 SUVGD 2024 0 ES 1980 Million PPS 212592.99 #> 7 SUVGD 2024 0 FI 1980 Million PPS 35398.79 #> [ reached 'max' / getOption("max.print") -- omitted 928228 rows ] ``` ### Year - [back](#filter) {#varyear} To request data for a year, use: > **ardeco_get_dataset_data('_varcode_', year='_val-year_')** where * ***varcode*** is the code of one of the available variables exposed in ARDECO database * ***val-year*** is the year (or years) for which the data are requested. This parameter can be: * a single number representing the requested year (example: **year=2020**): _Note: the val-year can be with or without the quote **'** (example: **year=2020** or **year='2020'**)_ * a quoted string with a series of years separated by comma. In this case, it will be returned the values for each of the listed year (example: **year='2011, 2015, 2016'** return values for year 2011, 2015 and 2016). _Note: in this case, the quotes are mandatory_. * a quoted string with a couple of values divided by '-'. In this case, the two numbers represent the temporal windows for which will be returned the values. Example: **year='2015-2018'** return the for years 2015, 2016, 2017, 2018). _Note: in this case, the quotes are mandatory_. Example: ``` r # Retrive data only for year = 2015 mydf <- ardeco_get_dataset_data('SNPTD', year=2015, version=2021, nutscode='ITF11') print(mydf, max=50) #> VARIABLE VERSIONS LEVEL NUTSCODE YEAR UNIT VALUE #> 1 SNPTD 2021 3 ITF11 2015 Persons 303200 ``` ``` r # Retrive data for years 2015 and 2018 mydf <- ardeco_get_dataset_data('SNPTD', year='2015,2018', version=2021, nutscode='ITF11') print(mydf, max=50) #> VARIABLE VERSIONS LEVEL NUTSCODE YEAR UNIT VALUE #> 1 SNPTD 2021 3 ITF11 2015 Persons 303200 #> 2 SNPTD 2021 3 ITF11 2018 Persons 298200 ``` ``` r # Retrive data for years from 2015 to 2018 mydf <- ardeco_get_dataset_data('SNPTD', year='2015-2018', version=2021, nutscode='ITF11') print(mydf, max=50) #> VARIABLE VERSIONS LEVEL NUTSCODE YEAR UNIT VALUE #> 1 SNPTD 2021 3 ITF11 2015 Persons 303200 #> 2 SNPTD 2021 3 ITF11 2016 Persons 301500 #> 3 SNPTD 2021 3 ITF11 2017 Persons 299900 #> 4 SNPTD 2021 3 ITF11 2018 Persons 298200 ``` ### Dimensions - [back](#filter) {#dimens} If a variable have also additional dimensions, it is possible to filter for these additional dimensions, using the following syntax (like the other dimensions): > **ardeco_get_dataset_data('_varcode_', dim-1 = _dim-value-1_, ..., dim-n = _dim-value-n_)** Where * ***dim-x***: one of the additional dimension defined in the variable * ***dim-value-x***: of the possible values for the selected dimension To check the existance of additional dimensions for the requested variable, use: > **ardeco_get_dataset_list('_varcode_')** (see ['List of the available datasets'](#datasetlist)) Example: ``` r # Check the available dimensions for variable SNPTN ardeco_get_dataset_list('SNPTN') #> # A tibble: 12 × 5 #> var sex age unit vers #> #> 1 SNPTN Females 65 years and over Persons 2016, 2021, 2024 #> 2 SNPTN Males 65 years and over Persons 2016, 2021, 2024 #> 3 SNPTN Total 65 years and over Persons 2016, 2021, 2024 #> 4 SNPTN Females From 15 to 64 years Persons 2016, 2021, 2024 #> 5 SNPTN Males From 15 to 64 years Persons 2016, 2021, 2024 #> 6 SNPTN Total From 15 to 64 years Persons 2016, 2021, 2024 #> 7 SNPTN Females Less than 15 years Persons 2016, 2021, 2024 #> 8 SNPTN Males Less than 15 years Persons 2016, 2021, 2024 #> 9 SNPTN Total Less than 15 years Persons 2016, 2021, 2024 #> 10 SNPTN Females Total Persons 2016, 2021, 2024 #> 11 SNPTN Males Total Persons 2016, 2021, 2024 #> 12 SNPTN Total Total Persons 2016, 2021, 2024 ``` ``` r # Retrive data only for sex = 'Males' and age = 'Total' mydf <- ardeco_get_dataset_data('SNPTN', sex='Males', age = 'Total') print(mydf, max=50) #> VARIABLE VERSIONS LEVEL NUTSCODE YEAR SEX AGE UNIT VALUE #> 1 SNPTN 2024 0 AL 1960 Males Total Persons 804267 #> 2 SNPTN 2024 0 AT 1960 Males Total Persons 3357007 #> 3 SNPTN 2024 0 BE 1960 Males Total Persons 4448001 #> 4 SNPTN 2024 0 BG 1960 Males Total Persons 3815790 #> 5 SNPTN 2024 0 CH 1960 Males Total Persons 2553621 #> [ reached 'max' / getOption("max.print") -- omitted 940831 rows ] ``` ### Tercets - [back](#filter) {#tercet} It's possible to retrieve the data for a variable at tercel level or a single tercet class (see ['List of tercet'](#tercetlist)). This function computes the tercet values at level 0 for any requested country and tercet class. The function permit to retrieve data at tercet level in two main ways: * by ***tercet***: return a value (absolute or percentage) for any requested country and for any tercet class defined into the requested tercet * by ***tercet class***: return a value (absolute or percentage) for any requested country for the requested tercet class The correct syntax to request data by ***tercet*** is the following: > **ardeco_get_dataset_data('_varcode_', tercet_code=_val-tercet-code_[, show_perc=TRUE])** where * ***varcode*** is the code of one of the available variables exposed in ARDECO database * ***val-tercet-code*** is one of the code of the available tercet for the requested variable (see ['List of tercet'](#tercetlist))) * ***show_perc*** (optional parameter) is a flag and it can be TRUE or FALSE (FALSE by default). If **FALSE** (default), the functions return the absolute value of the requested tercet classes. If **TRUE**, the functions return the share (pecentage) of the requested tercet classes. Example: ``` r # Check available tercet for variable SNPTD ardeco_get_tercet_list('SNPTD') #> tercet_code tercet_name tercet_class_code #> 1 1 by Urban-Rural Typology 0 #> 2 1 by Urban-Rural Typology 2 #> 3 1 by Urban-Rural Typology 1 #> 4 2 by Urban-Rural Typology with Remoteness 0 #> 5 2 by Urban-Rural Typology with Remoteness 3 #> 6 2 by Urban-Rural Typology with Remoteness 4 #> 7 2 by Urban-Rural Typology with Remoteness 5 #> 8 2 by Urban-Rural Typology with Remoteness 6 #> tercet_class_name #> 1 Predominantly urban #> 2 Predominantly rural #> 3 Intermediate #> 4 Predominantly urban #> 5 Intermediate, close to a city #> 6 Intermediate, remote #> 7 Predominantly rural, close to a city #> 8 Predominantly rural, remote ``` ``` r # Retrieve absolute values for tercet 'Urban-Rural Typology' for variable 'SNPTD' for country 'IT', year 2020, nuts version 2021 mydf <- ardeco_get_dataset_data('SNPTD', tercet_code=1, nutscode='IT', year=2020, version=2021) print(mydf) #> VARIABLE VERSIONS LEVEL NUTSCODE TERCET_CLASS_CODE TERCET_CODE YEAR UNIT #> 1 SNPTD 2021 0 IT 0 1 2020 Persons #> 2 SNPTD 2021 0 IT 1 1 2020 Persons #> 3 SNPTD 2021 0 IT 2 1 2020 Persons #> TERCET_NAME TERCET_CLASS_NAME VALUE #> 1 by Urban-Rural Typology Predominantly urban 28475100 #> 2 by Urban-Rural Typology Intermediate 24767700 #> 3 by Urban-Rural Typology Predominantly rural 6196100 ``` ``` r # Retrieve share for tercet 'Urban-Rural Typology' for variable 'SNPTD' for country 'IT', year 2020, nuts version 2021 mydf <- ardeco_get_dataset_data('SNPTD', tercet_code=1, nutscode='IT', year=2020, version=2021, show_perc=TRUE) print(mydf) #> VARIABLE VERSIONS LEVEL NUTSCODE TERCET_CLASS_CODE TERCET_CODE YEAR UNIT #> 1 SNPTD 2021 0 IT 0 1 2020 Persons #> 2 SNPTD 2021 0 IT 1 1 2020 Persons #> 3 SNPTD 2021 0 IT 2 1 2020 Persons #> TERCET_NAME TERCET_CLASS_NAME VALUE #> 1 by Urban-Rural Typology Predominantly urban 12.16 #> 2 by Urban-Rural Typology Intermediate 10.56 #> 3 by Urban-Rural Typology Predominantly rural 2.64 ``` The correct syntax to request data by ***tercet_class*** is the following: > **ardeco_get_dataset_data('_varcode_', tercet_class_code=_val-tercet-calss-code_[, show_perc=TRUE])** where * ***varcode*** is the code of one of the available variables exposed in ARDECO database * ***val-tercet-code*** is one of the code of the available tercet class for the requested variable (see ['List of tercet'](#tercetlist))) * ***show_perc*** (optional parameter) is a flag and it can be **TRUE** or **FALSE** (**FALSE** by default). If **FALSE** (default), the functions return the absolute value of the requested tercet class. If **TRUE**, the functions return the share (pecentage) of the requested tercet class. Example: ``` r # Check available tercet classes for variable SNPTD ardeco_get_tercet_list('SNPTD') #> tercet_code tercet_name tercet_class_code #> 1 1 by Urban-Rural Typology 0 #> 2 1 by Urban-Rural Typology 2 #> 3 1 by Urban-Rural Typology 1 #> 4 2 by Urban-Rural Typology with Remoteness 0 #> 5 2 by Urban-Rural Typology with Remoteness 3 #> 6 2 by Urban-Rural Typology with Remoteness 4 #> 7 2 by Urban-Rural Typology with Remoteness 5 #> 8 2 by Urban-Rural Typology with Remoteness 6 #> tercet_class_name #> 1 Predominantly urban #> 2 Predominantly rural #> 3 Intermediate #> 4 Predominantly urban #> 5 Intermediate, close to a city #> 6 Intermediate, remote #> 7 Predominantly rural, close to a city #> 8 Predominantly rural, remote ``` ``` r # Retrieve absolute value for tercet class 'Predominantly urban' for variable 'SNPTD' for country 'IT', year 2020, nuts version 2021 mydf <- ardeco_get_dataset_data('SNPTD', tercet_class_code=0, nutscode='IT', year=2020, version=2021) print(mydf) #> VARIABLE VERSIONS LEVEL NUTSCODE TERCET_CLASS_CODE YEAR UNIT TERCET_CLASS_NAME VALUE #> 1 SNPTD 2021 0 IT 0 2020 Persons Predominantly urban 28475100 ``` ``` r # Retrieve share for tercet class 'Predominantly urban' for variable 'SNPTD' for country 'IT', year 2020, nuts version 2021 mydf <- ardeco_get_dataset_data('SNPTD', tercet_class_code=0, nutscode='IT', year=2020, version=2021, show_perc=TRUE) print(mydf) #> VARIABLE VERSIONS LEVEL NUTSCODE TERCET_CLASS_CODE YEAR UNIT TERCET_CLASS_NAME VALUE #> 1 SNPTD 2021 0 IT 0 2020 Persons Predominantly urban 12.16 ```