[R] vectorize data string analysis

Glenn Schultz glennmschultz at me.com
Tue Mar 3 04:01:02 CET 2015


Hello All,

I have to admit that I am not that good when it comes to vectorizing a function.  I need some insight.  Is the below a case where vectorization can be accomplished to improve speed?

Below the function a sample data - as you can see it is not delimited.  However, the record length is 220 characters.  So I wrote the following code to delimit the data set "/r".  The function works and I have a dataset that can then be inserted into a MySql data table.  However, the actual data set is 518,000 records so the number of characters is 518000 * 220.  It takes R hours to parse this using the function I have written.  Can this be vectorized or is this a loop deal?

Best Regards,
Glenn  

#' FNMA Factor
  #' 
  #' This function parses the FNMA factor file for load into
  #' into a database table the FNMA factor file is non-delimited
  #' @param filepath A character vector specifying a data director
  #' @param lenght of the line A numeric value equal to the length of a line
  #' @export
  FNMAFactor <- function(filepath = character){
  callpath <- paste(filepath,"mbsfact.txt", sep = "")
  returnpath <- paste(filepath,"factor.txt", sep = "")
  data <- readLines(con = callpath)
  numchar <- nchar(data, type = "chars")
  start <- c(seq(1, numchar, 220))
  end <- c(seq(220, numchar, 220))
  for(i in 1 : length(start)){
  write(str_sub(data, start[i], end[i]), file = returnpath, append = TRUE)}
  }



31365EJ46 CI125483  00002003473100OCT03000003103340610.1548980406.500030197040112180MULTIPLE POOL                                                                          00000070147FNMS 06.500 CI12548307017009600000000031371KMA6 CL254253  00001304570700OCT03000010156865640.7785600006.000030102030132357MULTIPLE POOL                                                                          00000067230FNMS 06.000 CL25425306715033300000000031371RE44 CL259455  00000983651400OCT03000003447615880.3504916406.500050102050132357MULTIPLE POOL                                                                          00000070200FNMS 06.500 CL25945507045034000000000031376KBB1 CL357434  00002505145900OCT03000025021294240.9987958905.000090103090133359MULTIPLE POOL                                                                          00000055000FNMS 05.000 CL35743405500035800000000031385XE52 WS555556  00003651248300OCT03000033344198060.9132273504.575050103050133356MEGA POOL                               ** NOT AN ACTIVE SERVICER **                   00000052440FNAR 04.595 WS55555600000000000000000031385XLL9 WS555731  00013439369600OCT03000129242191330.9616685505.360080103040133352MEGA POOL                               ** NOT AN ACTIVE SERVICER **                   00000075160FNAR 05.368 WS55573100000000000000000031390XG87 CI659123  00000208856500OCT03000001136251660.5440346206.000080102080117179WASHINGTON MUTUAL BANK, FA              19850 PLUMMER STREET          CHATSWORTH     CA91311069210FNMS 06.000 CI65912306909016500000000031403BTR4 CL744060  00000770371700OCT03000007694084860.9987496805.000090103080133356MULTIPLE POOL                                                                          00000053920FNMS 05.000 CL74406000000000000000000031403GND0 LB748388  00000952312900OCT03000009512089400.9988407604.525090103080133358DLJ MORTGAGE CAPITAL INC.               ELEVEN MADISON AVENUE         NEW YORK       NY10010058430FNAR XX.XXX LB74838800000000000000000031403GNG3 LB748391  00000715661500OCT03000007007212290.9791238304.379090103080133358DLJ MORTGAGE CAPITAL INC.               ELEVEN MADISON AVENUE         NEW YORK       NY10010056530FNAR XX.XXX LB748391000000000000000000


More information about the R-help mailing list