[R] Processing a hierarchical string name

Kevin Zembower kev|n @end|ng |rom zembower@org
Thu Jun 29 20:35:25 CEST 2023


Ivan and Bert, thank you so much for your help.

Ivan, your solution worked perfectly. I didn't really understand how to 
do string processing on a vector of strings, and your solution 
demonstrated it for me. I modified it to work with the tidyverses' 
stringr library in this way:

bg3_race_sum <- bg3_race %>%
     left_join(pl_vars, by=c("variable" = "name")) %>%
     group_by(variable) %>%
     summarize(count = sum(value)) %>%
     left_join(pl_vars, by=c("variable" = "name")) %>%
     filter(count > 0) %>%
     .$label %>%
     str_replace("^ !!", "") %>% #Drop the leading ' !!'
     str_replace_all("[^!]*!!", " ") #Replace each !!.* with space

Bert, your solution was close to correct. It correctly dropped the right 
text, but didn't insert a space for each piece of text between "!!" and 
after the ":". I'm using those spaces to preserve the hierarchical 
nature of the numbers, how lower numbers (in the chart) are included in 
higher numbers. For instance, the "Total:" number is the sum of 
"Population of one race" and "Population of two or more races".

Thank you both for helping me with this specific problem and for 
increasing my knowledge and abilities with R.

-Kevin

On 6/28/23 16:56, Ivan Krylov wrote:
> On Wed, 28 Jun 2023 20:29:23 +0000
> Kevin Zembower via R-help <r-help using r-project.org> wrote:
> 
>> I think my algorithm for the labels is:
>> 1. keep everything from the last "!!" up to and including the last
>> character
>> 2. for everything remaining, replace each "!!.*:" group with a single
>> space.
> 
> If you remove the initial ' !!', the problem becomes a more tractable
> "replace each group of non-'!' followed by '!!' with one space":
> 
> bg3_race_sum$label |>
>   (\(.) sub('^ !!', '', .))() |>
>   (\(.) gsub('[^!]*!!', ' ', .))()
> 
> But that solution could have been impossible if the task was slightly
> different.
> 
>> I can split the label using str_split(label, pattern = "!!") to get a
>> vector of strings, but don't know how to work on the last string and
>> all the rest of the strings separately.
> 
> str_split() would have given you a list of character vectors. You can
> use lapply to evaluate a function on each vector inside that list.
> Inside the function, use length(x) (if `x` is the argument of the
> function) to find out how many spaces to produce and which element of
> the vector is the last one. (For code golf points, use rev(x)[1] to get
> the last element.)
>



More information about the R-help mailing list