[R] Fw: Regex problem

Carl Sutton suttoncarl at ymail.com
Thu Jan 5 19:09:20 CET 2017


Re-sending help request, went to wrong addy first time.  
r-help-request at r-project.org

Belated Happy new year to the Guru's:

I have a data frame with 570+ columns and in those column headers yours truly has a few blunders.  Namely somehow I managed to end some of them with both an apostrophe ' and an ending quote.   I think the attached code finds the occurrences (not 100% sure) and feedback is appreciated.  This is my first attempt at regex and I have been googling and reading the last few days (including an R -Exercise).

Confused as to why the column names shows a "." instead of a " ' ".

Ignorant of why gregexpr and regexpr show attr(,"useBytes") as TRUE when the default is FALSE.  Is it possible I somehow messed them up last week?   Simply typing the function name in the console shows the defaults as FALSE.

I have not been able to build a construct to simply delete the apostrophe.  I have made several attempts to do this, and left one for your perusal.  The others were just to "off the wall" and embarrassing.

Lastly, is there a way for me to check that all of my column names end with a letter followed by a quote?  I am thinking something along the lines of "[[:alpha:]\\"" but I expect that will throw an error.  I stumbled upon the ' " problem when dplyr complained about it last week, and it is unsettling to think I may have more goofs.

Any suggestions of a good reference book is much appreciated.  I can see extended use of regex coming toward me and I am so ignorant it is frightening (all volunteer work, no $'s involved, but I dislike being incompetent).


#  regex problemdf1 <- data.frame("WhatAmI'" = 1:5, "WhoAreYou" = 11:15)
colnames(df1)
df1
ma_pattern <- "[[:punct:]][[:punct:]]" # Need single ][ in the middle??
grep(ma_pattern,colnames(df1))
ma_pattern <- "[[:punct:][:punct:]]"  #  single ][ worked
grep(ma_pattern,colnames(df1),value = TRUE)  #  found it
grepl(ma_pattern,colnames(df1)) 
gregexpr(ma_pattern,colnames(df1))   # at position 8
regexpr(ma_pattern,colnames(df1))

#sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
#    fixed = FALSE, useBytes = FALSE)

#sub(ma_pattern,replacement = "'\\"",df1)
colnames(df1)

Carl Sutton



More information about the R-help mailing list