my-vignette

Packages that we will use for data prep, IRT-M estimation, and data visualization:

## Data prep:
library(tidyverse) # version: tidyverse_2.0.0 
library(dplyr) #version: dplyr_1.1.4
library(stats) # version: stats4
library(fastDummies) # version: fastDummies_1.7.3
library(reshape2) #version: reshape2_1.4.4

## IRT-M estimation:
#devtools::install_github("dasiegel/IRT-M")
library(IRTM) #version 1.00

## Results visualization: 
library(ggplot2)  # version: ggplot2_3.4.4
library(ggridges) #version: ggridges_0.5.6 
library(RColorBrewer) #version: RColorBrewer_1.1-3
library(ggrepel) # version: ggrepel_0.9.5 

In this document we narratively walk through use of the IRT-M package.

While the IRT-M framework is case agnostic, this vignette focuses on a single use case in which a hypothetical research team is interested in seeking empirical support for the hypothesis that anti-immigration attitudes in Europe are associated with perceptions of cultural, economic, and security threats. We illustrate data preparation, IRT-M estimation, visualization, and analysis with a small synthetic dataset (N=3000) based on the Eurobarometer 94.3. The real data can be accessed via: .

This example illustrates one of the strengths of the IRT-M model; namely the (very common) situation in which researchers have a theoretical question and related data that does not directly address the substantive question of interest. In this case, we work through a research question derived from the literature on European attitudes towards immigration. The threat response hypothesis is supported by the literature~ but was not a specific focus of the 2020-2021 Eurobarometer wave. Consequentially, the survey did not directly ask questions about the three threat dimensions of interest. However, the survey contained several questions adjacent to threat perception. We can use these questions to build an M-Matrix and estimate latent threat dimensions.

Regardless of the use case, there are three steps to preparing data for the IRT-M Package:

Loading the package

Gfortran is a necessary to load the package. Gfortran can be readily downloaded and the version installed can be checked: On Windows, enter into the command line “$ gfortran –version GNU Fortran.” On Mac, enter into the terminal “which gfortran.” After gfortran has been successfully installed, it is important to also have GCC (GNU Compiler Collection) installed.

Formatting the Observed Data

Formatting the Data

IRT-M requires that users convert categorical variables into a numeric format with one response loaded into each input object (aka: question). The most straightforward way to do this is to use a library, such as fastDummies to expand the entire instrument into one-hot or binary encoding. This is also a good opportunity to export an empty data-frame with the list of question codes, to code the M-Code object. Doing so reduces the likelihood of formatting slippages that are tedious to fix. We have also found it to be worthwhile to insert column with human-readable notes next to the question codes. This step adds some overhead, but— much like commenting code in general— is invaluable for debugging and analysis.

Creating the M-Matrix

The core step in using the IRT-M package is to map the underlying theoretical constructions of interest into an ‘M-Matrix’ of dimension loadings. In order to develop the loadings, we go through every question on the “test” object (here: every question in the survey) and decide whether the question relates to one of our hypothesized theoretical dimensions. For those questions that we believe load on the theoretical dimension of interest, we can code whether we expect it to positively load—meaning that an affirmative response implies more of the dimension— or negatively load. We code positive “loading” as \(1\) in the M-Matrix, negative “loading” as \(-1\). We can also denote that the question has no relationship with the theoretical dimension of interest, in which case we code 0 into the dimension. If we are unsure, not coding the loading inserts \(NA\) value, and the model will learn any loading during the estimation step.

To begin, we reformat data so that each possible answer becomes a separate binary question (One Hot encoding). In preparing the data, we used the dummy_cols() utility from the fastdummies package. Finally, we rename the new binary dataframe as Y to underscore that this is the observed data that we will be modeling. Please ensure that the dataPath variable is adjusted for your local file structure.