[R] Processing a hierarchical string name

Kevin Zembower kev|n @end|ng |rom zembower@org
Wed Jun 28 22:29:23 CEST 2023


Hello, all

I'm trying to process the names of the variables in the US Census 
database, that I'm retrieving with tidycensus. My end goal is to produce 
nicely formatted tables with natural labels.

The labels as downloaded from the US Census look like this:

## Get the P1 table for block group 3 in census tract 2711.01:
bg3_race <- get_decennial(
     geography = "block group",
     state = "MD",
     county = "Baltimore city",
     table = "P1",
     cache_table = TRUE,
     year = "2020",
     sumfile = "pl")%>%
     filter(substr(GEOID, 6, 12) == "2711013")

## Load the names and labels of the variables:
pl_vars <- load_variables(year = "2020", dataset = "pl", cache = TRUE)

## Join the labels to the variables, and drop the zero counts
bg3_race_sum <- bg3_race %>%
     left_join(pl_vars, by=c("variable" = "name")) %>%
     filter(value > 0) %>%
     select(c(GEOID, value, label))

head(bg3_race_sum$label)
[1] " !!Total:" 

[2] " !!Total:!!Population of one race:" 

[3] " !!Total:!!Population of one race:!!White alone" 

[4] " !!Total:!!Population of one race:!!Black or African American 
alone"
[5] " !!Total:!!Population of one race:!!American Indian and Alaska 
Native alone"
[6] " !!Total:!!Population of one race:!!Asian alone" 


I think my algorithm for the labels is:
1. keep everything from the last "!!" up to and including the last character
2. for everything remaining, replace each "!!.*:" group with a single space.

This turns head() into:
"Total:"
" Population of one race:"
"  White alone"
"  Black or African American alone"
"  American Indian and Alaska Native alone"
"  Asian alone"
[may not be clearly visible if not rendered in a monospaced font]

I think that I need lapply here, but I'm not sure of that, and of what 
to do next. I can split the label using str_split(label, pattern = "!!") 
to get a vector of strings, but don't know how to work on the last 
string and all the rest of the strings separately.

Thank you for any suggestions to nudge me along towards a workable solution.

-Kevin



More information about the R-help mailing list