[R] using a regular expression

Glenn Schultz glennmschultz at me.com
Sat Sep 10 21:23:37 CEST 2016


I have a file that for basically carries three datasets of differing lengths.  To make this a single downloadable file the creator of the file as used both NUL hex00 and space hex20 to normalize the lengths.

Below is the function that I am writing.  I am using sed to replace the hex characters.  First, to get past NUL I use sed to replace hex 00 with hex 20.  This has worked.  Once the Nul is removed and can successfully parse the file with ReadLine sub_str.  This final step before delimiting the file and making it nice and tidy is to remove the hex 20 characters.   I am using the same strategy to eliminate the spaces and sed command works in a shell but does not work in the R function.  What am I doing wrong?  I have dput - some of the nastier lines with hex 20 characters below my code.

Any advice is appreciated.

Glenn

arm <- function(filepath){
callpath <- paste(filepath, "arm.txt", sep ="")
ARMReturn <- paste(filepath, "arm.csv", sep = "")
ARMPoolReturnPath <- paste(filepath,"armatpool.csv", sep = "")
ARMNextChgReturnPath <- paste(filepath,"nexratechangedate.csv", sep = "")
ARMFirstPmtReturnPath <- paste(filepath,"firstpaymentdate.csv", sep = "")

# This file contains NUL hex characters before parsing the file replace
# the hex NUL x00 with space x20 and save as a csv file. Use system command
sedcommand <- paste("sed -e 's/\\x00/\\x20/g' <", 
filepath, "arm.txt", 
">", "arm.csv", sep = " ")
system(sedcommand)

# read the arm quartile data to a file once skipNuls then length of each
# record set changes and the data map provided by FNMA is no longer valid
# with respect to the length of each embedded data set
data <- readLines(ARMReturn, encoding = "ascii")

quartile <- NULL
numchar <- nchar(x = data, type = "chars")
start <- c(seq(1, numchar, 399))
end <- c(seq(399, numchar, 399))
quartile <- str_sub(data, start[1:length(start)], end[1:length(end)])
write(quartile, ARMReturn)

# The file has been parsed accroding to length 400 for each data element.
# The next step is to remove all the trailing white space hex character
# x20

sedcommand2 <- paste("sed -e '/\\x20/d' <", 
filepath, "arm.csv", 
">", "arm2.csv", sep = "")
system(sedcommand2)
} # end of function


c("                                                 555556 WS320021201006125{000378{000348{                                                                                                                                                                                                                                                                                                                       ", 
"                                                  555556 WS320021201006250{000954{000880{                                                                                                                                                                                                                                                                                                                      ", 
"                                                   555556 WS320021201005625{001062{000983{                                                                                                                                                                                                                                                                                                                     ", 
"                                                    555556 WS320030101005250{000027{000025{                                                                                                                                                                                                                                                                                                                    ", 
"                                                     555556 WS320030101006500{000033{000030{                                                                                                                                                                                                                                                                                                                   ", 
"                                                      555556 WS320030101005125{000061{000056{                                                                                                                                                                                                                                                                                                                  ", 
"                                                       555556 WS320030101005375{000095{000088{                                                                                                                                                                                                                                                                                                                 ", 
"                                                        555556 WS320030101005350{000217{000200{                                                                                                                                                                                                                                                                                                                ", 
"                                                         555556 WS320030101006125{000400{000369{                                                                                                                                                                                                                                                                                                               ", 
"                                                          555556 WS320030101005310{000439{000406{                                                                                                                                                                                                                                                                                                              ", 
"                                                           555556 WS320030101006000{000573{000529{                                                                                                                                                                                                                                                                                                             "






More information about the R-help mailing list