Help for package ivdoctr

Title:

Ensures Mutually Consistent Beliefs When Using IVs

Version:

1.0.1

Description:

Uses data and researcher's beliefs on measurement error and instrumental variable (IV) endogeneity to generate the space of consistent beliefs across measurement error, instrument endogeneity, and instrumental relevance for IV regressions. Package based on DiTraglia and Garcia-Jimeno (2020) <doi:10.1080/07350015.2020.1753528>.

License:

CC0

LazyData:

TRUE

Depends:

R (≥ 2.10)

Imports:

AER, coda, data.table, graphics, MASS, Rcpp (≥ 0.11.6), rgl, sandwich, stats

LinkingTo:

Rcpp, RcppArmadillo

Suggests:

testthat, haven, MCMCpack, knitr, rmarkdown

RoxygenNote:

7.1.2

Encoding:

UTF-8

NeedsCompilation:

yes

BugReports:

https://github.com/emallickhossain/ivdoctr/issues

VignetteBuilder:

knitr

Packaged:

2021-12-05 11:35:47 UTC; mallick

Author:

Frank DiTraglia [aut], Mallick Hossain [aut, cre]

Maintainer:

Mallick Hossain <emallickhossain@gmail.com>

Repository:

CRAN

Date/Publication:

2021-12-05 16:00:02 UTC

Burde and Linden (2013, AEJ Applied) Dataset

Description

Replicates IV using controls from Table 2

Usage

afghan

Format

A data frame with 687 rows and 17 variables:

enrolled: Indicator if child is enrolled in formal school. Outcome.
testscore: Normalized test score
buildschool: Indicator if village is treated. Instrument.
headchild: Indicator if child is child of head of household
nhh: Number of household members
female: Female indicator
age: Child's age
yrsvill: Time family has lived in village
farsi: Indicator for speaking Farsi
tajik: Indicator for speaking Tajik
farmers: Indicator for if head of household is a farmer
land: Number of jeribs of land owned
agehead: Head of household age
educhead: Years of education for head of household
sheep: Number of sheep and goats owned
chagcharan: Indicator if village is in Chagcharan district
distschool: Distance to nearest non-community based school

Source

Provided by author.

References

https://www.jstor.org/stable/3083335

B function from Proposition A3

Description

B function from Proposition A3

Usage

b_functionA3(obs_draws, g, psi)

Arguments

obs_draws

Row of the data.frame of observable draws

g

Value from g function

psi

Psi value

Value

A min and a max of the B function

Evaluates the corners given user bounds. Vectorized wrt multiple draws of obs.

Description

Evaluates the corners given user bounds. Vectorized wrt multiple draws of obs.

Usage

candidate1(r_TstarU_lower, r_TstarU_upper, k_lower, k_upper, obs)

Arguments

r_TstarU_lower

Vector of lower bounds of endogeneity

r_TstarU_upper

Vector of upper bounds of endogeneity

k_lower

Vector of lower bounds on measurement error

k_upper

Vector of upper bounds on measurement error

obs

Observables generated by get_observables

Value

List containing vector of lower bounds and vector of upper bounds of r_uz

Evaluates the edge where k is on the boundary. Vectorized wrt multiple draws of obs.

Description

Evaluates the edge where k is on the boundary. Vectorized wrt multiple draws of obs.

Usage

candidate2(r_TstarU_lower, r_TstarU_upper, k_lower, k_upper, obs)

Arguments

r_TstarU_lower

Vector of lower bounds of endogeneity

r_TstarU_upper

Vector of upper bounds of endogeneity

k_lower

Vector of lower bounds on measurement error

k_upper

Vector of upper bounds on measurement error

obs

Observables generated by get_observables

Value

List containing vector of lower bounds and vector of upper bounds of r_uz

Evaluates the edge where r_TstarU is on the boundary.

Description

Evaluates the edge where r_TstarU is on the boundary.

Usage

candidate3(r_TstarU_lower, r_TstarU_upper, k_lower, k_upper, obs)

Arguments

r_TstarU_lower

Vector of lower bounds of endogeneity

r_TstarU_upper

Vector of upper bounds of endogeneity

k_lower

Vector of lower bounds on measurement error

k_upper

Vector of upper bounds on measurement error

obs

Observables generated by get_observables

Value

List containing vector of lower bounds and vector of upper bounds of r_uz

Collapse 3-d array to matrix

Description

Collapse 3-d array to matrix

Usage

collapse_3d_array(myarray)

Arguments

myarray

A three-dimensional array.

Value

Matrix with the 3rd dimension appended as rows to the matrix

Acemoglu, Johnson, and Robinson (2001) Dataset

Description

Cross-country dataset used to construct Table 4 of Acemoglu, Johnson & Robinson (2001).

Usage

colonial

Format

A data frame with 64 rows and 9 variables:

shortnam: three letter country abbreviation, e.g. AUS for Australia
africa: dummy variable =1 if country is in Africa
lat_abst: absolute distance to equator (scaled between 0 and 1)
rich4: dummy variable, =1 for "Neo-Europes" (AUS, CAN, NZL, USA)
avexpr: Average protection against expropriation risk. Measures risk of government appropriation of foreign private investment on a scale from 0 (least risk) to 10 (most risk). Averaged over all years from 1985-1995.
logpgp95: Natural logarithm of per capita GDP in 1995 at purchasing power parity
logem4: Natural logarithm of European settler mortality
asia: dummy variable, =1 if country is in Asia
loghjypl: Natural logarithm of output per worker in 1988

Source

http://economics.mit.edu/faculty/acemoglu/data/ajr2001

References

https://www.aeaweb.org/articles.php?doi=10.1257/aer.91.5.1369

Computes bounds for simulated data

Description

This function takes data and user restrictions on measurement error and endogeneity and simulates data and the resulting bounds on instrument validity.

Usage

draw_bounds(
  y_name,
  T_name,
  z_name,
  data,
  controls = NULL,
  r_TstarU_restriction = NULL,
  k_restriction = NULL,
  n_draws = 5000
)

Arguments

y_name

Character vector of the name of the dependent variable

T_name

Character vector of the names of the preferred regressors

z_name

Character vector of the names of the instrumental variables

data

Data to be analyzed

controls

Character vector containing the names of the exogenous regressors

r_TstarU_restriction

2 element vector of bounds on r_TstarU

k_restriction

2-element vector of bounds on kappa

n_draws

Integer number of simulations to draw

Value

List containing simulated data observables (covariances, correlations, and R-squares), indications of whether the identified set is empty, the unrestricted and restricted bounds on instrumental relevance, instrumental validity, and measurement error.

Simulates different data draws

Description

This function takes the data and simulates potential draws of data from the properties of the observed data.

Usage

draw_observables(y_name, T_name, z_name, data, controls = NULL, n_draws = 5000)

Arguments

y_name

Character vector of the name of the dependent variable

T_name

Character vector of the names of the preferred regressors

z_name

Character vector of the names of the instrumental variables

data

Data to be analyzed

controls

Character vector containing the names of the exogenous regressors

n_draws

Integer number of simulations to draw

Value

Data frame containing covariances, correlations, and R-squares for each data simulation

Draws covariance matrix using the Jeffrey's Prior

Description

Draws covariance matrix using the Jeffrey's Prior

Usage

draw_sigma_jeffreys(y, Tobs, z, k, n_draws)

Arguments

y

Vector of dependent variable

Tobs

Matrix containing data for the preferred regressor

z

Matrix containing data for the instrumental variable

k

Number of covariates, including the intercept

n_draws

Integer number of draws to perform

Value

Array of covariance matrix draws

Creates LaTeX code for the HPDI

Description

Creates LaTeX code for the HPDI

Usage

format_HPDI(bounds)

Arguments

bounds

2-element vector of the upper and lower HPDI bounds

Value

LaTeX string of the HPDI

Creates LaTeX code for parameter estimates

Description

Creates LaTeX code for parameter estimates

Usage

format_est(est)

Arguments

est

Number

Value

LaTeX string for the number

Creates LaTeX code for the standard error

Description

Creates LaTeX code for the standard error

Usage

format_se(se)

Arguments

se

Standard error

Value

LaTeX string for the standard error

G function from Proposition A.2

Description

G function from Proposition A.2

Usage

g_functionA2(kappa, r_TstarU, obs_draws)

Arguments

kappa

Kappa value

r_TstarU

r_TstarU value

obs_draws

a row of the data.frame of observable draws

Value

G value

Computes coverage of list of intervals

Description

Computes coverage of list of intervals

Usage

getCoverage(data, guess)

Arguments

data

2-column data frame of confidence intervals

guess

2-element vector of confidence interval

Value

Coverage percentage

Generates smallest covering interval

Description

Generates smallest covering interval

Usage

getInterval(data, center, conf = 0.9, tol = 1e-06)

Arguments

data

2-column data frame of confidence intervals

center

2-element vector to center coverage interval

conf

Confidence level

tol

Tolerance level for convergence

Value

2-element vector of confidence interval

Computes L, lower bound for kappa_tilde in paper

Description

Computes L, lower bound for kappa_tilde in paper

Usage

get_L(draws)

Arguments

draws

data.frame of observables of simulated data

Value

Vector of L values

Solves for the magnification factor

Description

This function solves for the magnification factor given r_TstarU and kappa. It handles 3 potential cases when the magnification factor must be evaluated: 1. Across multiple simulations, but given the same r_TstarU and k 2. For multiple simulations, each with a value of r_TstarU and k 3. For one simulation across a grid of r_TstarU and k

Usage

get_M(r_TstarU, k, obs)

Arguments

r_TstarU

Vector of r_TstarU values

k

Vector of kappa values

obs

Observables generated by get_observables

Value

Vector of magnification factors

Computes a0 and a1 bounds

Description

Computes a0 and a1 bounds

Usage

get_alpha_bounds(draws, p)

Arguments

draws

data.frame of observables of simulated data

p

Treatment probability from binary data

Value

List of alpha bounds

Solves for beta

Description

This function solves for beta given r_TstarU and kappa. It handles 3 potential cases when beta must be evaluated: 1. Across multiple simulations, but given the same r_TstarU and k 2. For multiple simulations, each with a value of r_TstarU and k 3. For one simulation across a grid of r_TstarU and k

Usage

get_beta(r_TstarU, k, obs)

Arguments

r_TstarU

Vector of r_TstarU values

k

Vector of kappa values

obs

Observables generated by get_observables

Value

Vector of betas

Returns beta bounds in binary case using grid search

Description

Returns beta bounds in binary case using grid search

Usage

get_beta_bounds_binary(obs_draws, p, r_TstarU_restriction)

Arguments

obs_draws

Row of the data.frame of observable draws

p

Treatment probability from data

r_TstarU_restriction

2-element vector of restrictions on r_TstarU

Value

Min and max values for beta

Generates beta bounds off of beta draws

Description

Generates beta bounds off of beta draws

Usage

get_beta_bounds_binary_post(draws, n_observables)

Arguments

draws

Posterior draws

n_observables

Number of observable draws

Value

Upper and lower bounds of beta based on posterior draws

Wrapper function combines all unrestricted bounds together. Vectorized

Description

Wrapper function combines all unrestricted bounds together. Vectorized

Usage

get_bounds_unrest(obs)

Arguments

obs

Observables generated by get_observables

Value

List of unrestricted bounds for r_TstarU, r_uz, and kappa

Computes OLS and IV estimates

Description

Computes OLS and IV estimates

Usage

get_estimates(y_name, T_name, z_name, data, controls = NULL, robust = FALSE)

Arguments

y_name

Character vector of the name of the dependent variable

T_name

Character vector of the names of the preferred regressors

z_name

Character vector of the names of the instrumental variables

data

Data to be analyzed

controls

Character vector containing the names of the exogenous regressors

robust

Boolean of whether to compute heteroskedasticity-robust standard errors

Value

List of beta estimates and associated standard errors for OLS and IV estimation

Given observables from the data, generates unrestricted bounds for kappa. Vectorized

Description

Given observables from the data, generates unrestricted bounds for kappa. Vectorized

Usage

get_k_bounds_unrest(obs, tilde)

Arguments

obs

Observables generated by get_observables

tilde

Boolean of whether or not kappa_tilde or kappa is desired

Value

List of upper bounds and lower bounds for kappa

Computes beliefs that support valid instrument

Description

Computes beliefs that support valid instrument

Usage

get_new_draws(obs_draws, post_draws)

Arguments

obs_draws

data.frame of draws of reduced form parameters

post_draws

data.frame of posterior draws

Value

data.frame of new draws

Given data and function specification, returns the relevant correlations and covariances with any exogenous controls projected out.

Description

Given data and function specification, returns the relevant correlations and covariances with any exogenous controls projected out.

Usage

get_observables(y_name, T_name, z_name, data, controls = NULL)

Arguments

y_name

Name of the dependent variable

T_name

Name(s) of the preferred regressor(s)

z_name

Name(s) of the instrumental variable(s)

data

Data to be analyzed

controls

Exogenous regressors to be included

Value

List of correlations, covariances, and R^2 of first and second stage regressions after projecting out any exogenous control regressors

Compute the share of draws that could contain a valid instrument.

Description

Compute the share of draws that could contain a valid instrument.

Usage

get_p_valid(draws)

Arguments

draws

List of simulated draws

Value

Numeric of the share of valid draws as determined by having the the restricted bounds for r_uz contain zero.

Computes the lower bound of psi for binary data

Description

Computes the lower bound of psi for binary data

Usage

get_psi_lower(s2_T, p, kappa)

Arguments

s2_T

Vector of s2_T draws from observables

p

Treatment probability from binary data

kappa

Vector of kappa, NOTE: kappa_tilde in the paper

Value

Vector of lower bounds for psi

Computes the upper bound of psi for binary data

Description

Computes the upper bound of psi for binary data

Usage

get_psi_upper(s2_T, p, kappa)

Arguments

s2_T

Vector of s2_T draws from observables

p

Treatment probability from binary data

kappa

Vector of kappa, NOTE: kappa_tilde in the paper

Value

Vector of upper bounds for psi

Given observables from the data, generates the unrestricted bounds for rho_TstarU. Data does not impose any restrictions on r_TstarU Vectorized

Description

Given observables from the data, generates the unrestricted bounds for rho_TstarU. Data does not impose any restrictions on r_TstarU Vectorized

Usage

get_r_TstarU_bounds_unrest(obs)

Arguments

obs

Observables generated by get_observables

Value

List of upper and lower bounds for r_TstarU

Solves for r_uz given observables, r_TstarU, and kappa

Description

This function solves for r_uz given r_TstarU and kappa. It handles 3 potential cases when r_uz must be evaluated: 1. Across multiple simulations, but given the same r_TstarU and k 2. For multiple simulations, each with a value of r_TstarU and k 3. For one simulation across a grid of r_TstarU and k

Usage

get_r_uz(r_TstarU, k, obs)

Arguments

r_TstarU

Vector of r_TstarU values

k

Vector of kappa values

obs

Observables generated by get_observables

Value

Vector of r_uz values.

Evaluates r_uz bounds given user restrictions on r_TstarU and kappa

Description

This function takes observables from the data and user beliefs over the extent of measurement error (kappa) and the direction of endogeneity (r_TstarU) to generate the implied bounds on instrument validity (r_uz)

Usage

get_r_uz_bounds(r_TstarU_lower, r_TstarU_upper, k_lower, k_upper, obs)

Arguments

r_TstarU_lower

Vector of lower bounds of endogeneity

r_TstarU_upper

Vector of upper bounds of endogeneity

k_lower

Vector of lower bounds on measurement error

k_upper

Vector of upper bounds on measurement error

obs

Observables generated by get_observables

Value

2-column data frame of lower and upper bounds of r_uz

Given observables from the data, generates the unrestricted bounds for rho_uz. Vectorized

Description

Given observables from the data, generates the unrestricted bounds for rho_uz. Vectorized

Usage

get_r_uz_bounds_unrest(obs)

Arguments

obs

Observables generated by get_observables

Value

List of upper and lower bounds for rho_uz

Solves for the variance of the error term u

Description

This function solves for the variance of u given r_TstarU and kappa. It handles 3 potential cases when the variance of u must be evaluated: 1. Across multiple simulations, but given the same r_TstarU and k 2. For multiple simulations, each with a value of r_TstarU and k 3. For one simulation across a grid of r_TstarU and k

Usage

get_s_u(r_TstarU, k, obs)

Arguments

r_TstarU

Vector of r_TstarU values

k

Vector of kappa values

obs

Observables generated by get_observables

Value

Vector of variances of u

Generates parameter estimates given user restrictions and data

Description

Generates parameter estimates given user restrictions and data

Usage

ivdoctr(
  y_name,
  T_name,
  z_name,
  data,
  example_name,
  controls = NULL,
  robust = FALSE,
  r_TstarU_restriction = c(-1, 1),
  k_restriction = c(1e-04, 1),
  n_draws = 5000,
  n_RF_draws = 1000,
  n_IS_draws = 1000,
  resample = FALSE
)

Arguments

y_name

Character string with the column name of the dependent variable

T_name

Character string with the column name of the endogenous regressor(s)

z_name

Character string with the column name of the instrument(s)

data

Data frame

example_name

Character string naming estimation

controls

Vector of character strings specifying the exogenous variables

robust

Indicator for heteroskedasticity-robust standard errors

r_TstarU_restriction

2-element vector of min and max of r_TstarU.

k_restriction

2-element vector of min and max of kappa.

n_draws

Number of draws when generating frequentist-friendly draws of the covariance matrix

n_RF_draws

Number of reduced-form draws

n_IS_draws

Number of draws on the identified set

resample

Indicator of whether or not to resample using magnification factor

Value

List with elements:

ols: lm object of OLS estimation,
iv: ivreg object of the IV estimation
n: Number of observations
b_OLS: OLS point estimate
se_OLS: OLS standard errors
b_IV: IV point estimate
se_IV: IV standard errors
k_lower: lower bound of kappa
p_empty: fraction of parameter draws that yield an empty identified set
p_valid: fraction of parameter draws compatible with a valid instrument
r_uz_full_interval: 90% posterior credible interval for fully identified set of rho
beta_full_interval: 90% posterior credible interval for fully identified set of beta
r_uz_median: posterior median for partially identified rho
r_uz_partial_interval: 90% posterior credible interval for partially identified set of rho under a conditionally uniform reference prior
beta_median: posterior median for partially identified beta
beta_partial_interval: 90% posterior credible interval for partially identified set of beta under a conditionally uniform reference prior
a0: If treatment is binary, mis-classification probability of no-treatment case. NULL otherwise
a1: If treatment is binary, mis-classification probability of treatment case. NULL otherwise
psi_lower: lower bound for psi
binary: logical indicating if treatment is binary
k_restriction: User-specified bounds on kappa
r_TstarU_restriction: User-specified bounds on r_TstarU

Examples

library(ivdoctr)
endog <- c(0, 0.9)
meas <- c(0.6, 1)

colonial_example1 <- ivdoctr(y_name = "logpgp95", T_name = "avexpr",
                            z_name = "logem4", data = colonial,
                            controls = NULL, robust = FALSE,
                            r_TstarU_restriction = endog,
                            k_restriction = meas,
                            example_name = "Colonial Origins")

Generates table of parameter estimates given user restrictions and data

Description

Generates table of parameter estimates given user restrictions and data

Usage

makeTable(..., output)

Arguments

...

Arguments of TeX code for individual examples to be combined into a single table

output

File name to write

Value

LaTeX code that generates output table with regression results

Examples

library(ivdoctr)
endog <- c(0, 0.9)
meas <- c(0.6, 1)

colonial_example1 <- ivdoctr(y_name = "logpgp95", T_name = "avexpr",
                            z_name = "logem4", data = colonial,
                            controls = NULL, robust = FALSE,
                            r_TstarU_restriction = endog,
                            k_restriction = meas,
                            example_name = "Colonial Origins")
makeTable(colonial_example1, output = file.path(tempdir(), "colonial.tex"))

Takes the OLS and IV estimates and converts it to a row of the LaTeX table

Description

Takes the OLS and IV estimates and converts it to a row of the LaTeX table

Usage

make_full_row(stats, example_name)

Arguments

stats

List with OLS and IV estimates and the bounds on kappa and r_uz

example_name

Character string detailing the example

Value

LaTeX code passed to makeTable()

Makes LaTeX code to make a row of a table and shift by some amount of columns if necessary

Description

Makes LaTeX code to make a row of a table and shift by some amount of columns if necessary

Usage

make_tex_row(char_vec, shift = 0)

Arguments

char_vec

Vector of characters to be collapsed into a LaTeX table

shift

Number of columns to shift over

Value

LaTeX string of the whole row of the table

Generates a custom color palette given a vector of numbers

Description

Generates a custom color palette given a vector of numbers

Usage

map2color(x, pal, limits = NULL)

Arguments

x

Vector of numbers

pal

Palette function generate from colorRampPalette

limits

Limits on the numeric sequence

Value

Hex values for colors

Rounds x to two decimal places

Description

Rounds x to two decimal places

Usage

myformat(x)

Arguments

x

Number to be rounded

Value

Number rounded to 2 decimal places

Plot ivdoctr Restrictions

Description

Plot ivdoctr Restrictions

Usage

plot_3d_beta(
  y_name,
  T_name,
  z_name,
  data,
  controls = NULL,
  r_TstarU_restriction = c(-1, 1),
  k_restriction = c(0, 1),
  n_grid = 30,
  n_colors = 500,
  fence = NULL,
  gray_k = NULL,
  gray_rTstarU = NULL,
  theta = 0,
  phi = 15
)

Arguments

y_name

Character string with the column name of the dependent variable

T_name

Character string with the column name of the endogenous regressor(s)

z_name

Character string with the column name of the instrument(s)

data

Data frame

controls

Vector of character strings specifying the exogenous variables

r_TstarU_restriction

2-element vector of bounds for r_TstarU

k_restriction

2-element vector of bounds for kappa

n_grid

Number of points to put in grid

n_colors

Number of colors to use

fence

Vector of left, bottom, right, and top corners of rectangle

gray_k

2-element vector of kappa restrictions to recolor graph as gray

gray_rTstarU

2-element vector of rTstarU restrictions to recolor graph as gray

theta

Graphing parameters for orienting plot

phi

Graphing parameters for orienting plot

Value

Interactive 3d plot which can be oriented and saved using rgl.snapshot()

Examples

library(ivdoctr)
endog <- matrix(c(0, 0.9), nrow = 1)
meas <- matrix(c(0.6, 1), nrow = 1)

plot_3d_beta(y_name = "logpgp95", T_name = "avexpr",
            z_name = "logem4", data = colonial,
            r_TstarU_restriction = endog,
            k_restriction = meas)

Construct vectors of points that outline a rectangle.

Description

Construct vectors of points that outline a rectangle.

Usage

rect_points(xleft, ybottom, xright, ytop, step_x, step_y)

Arguments

xleft

The left side of the rectangle

ybottom

The bottom of the rectangle

xright

The right side of the rectangle

ytop

The top of the rectangle

step_x

The step size of the x coordinates

step_y

The step size of the y coordinates

Value

List of x-coordinates and y-coordinates tracing the points around the rectangle

Simulate draws from the inverse Wishart distribution

Description

Simulate draws from the inverse Wishart distribution

Usage

rinvwish(n, v, S)

Arguments

n

An integer, the number of draws.

v

An integer, the degrees of freedom of the distribution.

S

A numeric matrix, the scale matrix of the distribution.

Details

Employs the Bartlett Decomposition (Smith & Hocking 1972). Output exactly matches that of riwish from the MCMCpack package if the same random seed is used.

Value

A numeric array of matrices, each of which is one simulation draw.

Convert 3-d array to list of matrixes

Description

Convert 3-d array to list of matrixes

Usage

toList(myArray)

Arguments

myArray

A three-dimensional numeric array.

Value

A list of numeric matrices.

Becker and Woessmann (2009) Dataset

Description

Data on Prussian counties in 1871 from Becker and Woessmann's (2009) paper "Was Weber Wrong? A Human Capital Theory of Protestant Economic History."

Usage

weber

Format

A data frame with 452 rows and 44 variables:

kreiskey1871: kreiskey1871
county1871: County name in 1871
rbkey: District key
lat_rad: Latitude (in rad)
lon_rad: Longitude (in rad)
kmwittenberg: Distance to Wittenberg (in km)
zupreussen: Year in which county was annexed by Prussia
hhsize: Average household size
gpop: Population growth from 1867-1871 in percentage points
f_prot: Percent Protestants
f_jew: Percent Jews
f_rw: Percent literate
f_miss: Percent missing education information
f_young: Percent below the age of 10
f_fem: Percent female
f_ortsgeb: Percent born in municipality
f_pruss: Percent of Prussian origin
f_blind: Percent blind
f_deaf: Percent deaf-mute
f_dumb: Percent insane
f_urban: Percent of county population in urban areas
lnpop: Natural logarithm of total population size
lnkmb: Natural logarithm of distance to Berlin (km)
poland: Dummy variable, =1 if county is Polish-speaking
latlon: Latitude * Longitude * 100
f_over3km: Percent of pupils farther than 3km from school
f_mine: Percent of labor force employed in mining
inctaxpc: Income tax revenue per capita in 1877
perc_secB: Percentage of labor force employed in manufacturing in 1882
perc_secC: Percentage of labor force employed in services in 1882
perc_secBnC: Percentage of labor force employed in manufacturing and services in 1882
lnyteacher: 100 * Natural logarithm of male elementary school teachers in 1886
rhs: Dummy variable, =1 if Imperial of Hanseatic city in 1517
yteacher: Income of male elementary school teachers in 1886
pop: Total population size
kmb: Distance to Berlin (km)
uni1517: Dummy variable, =1 if University in 1517
reichsstadt: Dummy variable, =1 if Imperial city in 1517
hansestadt: Dummy variable, =1 if Hanseatic city in 1517
f_cath: Percentage of Catholics
sh_al_in_tot: Share of municipalities beginning with letter A to L
ncloisters1517_pkm2: Monasteries per square kilometer in 1517
school1517: Dummy variable, =1 if school in 1517
dnpop1500: City population in 1500

Source

https://www.ifo.de/en/iPEHD

References

https://www.ifo.de/en/iPEHD doi: 10.1162/qjec.2009.124.2.531