[R] Is there a way to vectorize this?

Nutter, Benjamin NutterB at ccf.org
Fri Oct 31 19:47:17 CET 2008


I apologize up front for this being a little long. I hope it's
understandable.  Please let me know if I need to clarify anything.

Several months ago I wrote a series of functions to help me take my R
analyses and build custom reports in html files.  Each function either
builds or modifies a string of html code that can then be written to a
file to produce the desired output.

To make modifications in the html code, I've placed 'markers' around
certain characteristics that I might want to change.  For instance, the
alignment characteristics have an 'algnmark' on either side of them.
When I wish to change the alignment, I can find where these markers are,
determine their location, and replace the contents between them. 

I've been using the functions for a few months now, and am pleased with
the utility.  Unfortunately, as I was writing these, I wasn't very
strong with my vectorization skills and relied on for loops (lots of for
loops) to get through the work.  So while I'm pleased with the utility,
I've been trying to optimize the functions by vectorizing the for loops.

At this point, I've hit a small snag.  I have a situation where I can't
seem to figure out how to vectorize the loop.  Part of me wonders if it
is even possible. 

The scenario is this:  I run a string of code through the loop, on each
pass, the section of code in need of modification is identified and the
changes are made.  When this is done, however, the length of the string
changes.  The change in length needs to be recognized in the next pass
through the loop.

Okay, some code to illustrate what I mean.  This first function formats
the html file.  I only include it because it will be necessary to create
illustrate what the function is doing.  I am eliminating all comments
and spacing from the code for brevity.

#******************************************* Start of html.file.start
'html.file.start' <- function(title, size=11, font="Times New Roman"){
  size <- format(floor(size),nsmall=1)
  code <- paste("
<html xmlns:o='urn:schemas-microsoft-com:office:office\'
  xmlns:w=\'urn:schemas-microsoft-com:office:word\'
  xmlns=\'http://www.w3.org/TR/REC-html40\'>
  <head>
    <meta http-equiv=Content-Type content=\'text/html;
charset=windows-1252\'>
    <meta name=ProgId content=Word.Document>
    <meta name=Generator content=\'Microsoft Word 11\'>
    <meta name=Originator content=\'Microsoft Word 11\'>
   <style>
      <!--
        /* Style Definitions */
     p.MsoNormal, li.MsoNormal, div.MsoNormal
     p.MsoEndnoteText, li.MsoEndnoteText, div.MsoEndnoteText
  {margin-top:2.0pt;
  margin-right:0in;
  margin-bottom:0in;
  margin-left:.15in;
  margin-bottom:.0001pt;
  text-indent:-.15in;
  mso-pagination:none;
  font-size:9.0pt;
  mso-bidi-font-size:10.0pt;
  font-family:'Times New Roman';
  mso-fareast-font-family:'Times New Roman';}
       p.Textbody, li.Textbody, div.Textbody    -->
   </style>
    ",
    "<title>",title,"</title>
  </head>
  <body lang=EN-US style=\'tab-interval:.5in;",
       "textmark; font-size:",size,"pt; textmark;",
       "fontmark; font-family:",font,"; fontmark;\'>", sep="")
  return(code)
} #******************************************** End of html.file.start


So here is the function I'm trying to vectorize.

#******************************************* Start of html.align
html.align <- function(code,new.align="left"){
  #* Create a string to replace the current alignment setting.
  align <- paste(" align=",new.align," ",sep="")

  #* Function to pass to sapply.  This is handy when 'code'
  #*  is a vector.
  f1 <- function(code,align=align){
    mark <- unlist(gregexpr("algnmark",code)) #* Get positions of
markers
    if(mark[1]>0){
      odd <- seq(1,length(mark),by=2) #* odd elements are starting
marker
      evn <- seq(2,length(mark),by=2) #* even elements are ending marker

      mark[odd] <- mark[odd]+9  #* These two lines determine the
starting
      mark[evn] <- mark[evn]-1  #* and ending elements of the substring
to
                                #* be replaced

      for(i in 1:length(odd)){

        l.old <- nchar(code)  #* store the length of the code segment.

        old.align <- substr(code,mark[odd[i]],mark[evn[i]]) 
                              #* Identify the old alignment setting
        
        code <- gsub(old.align,align,code) #* Replace alignment setting

        mark <- mark - (l.old-nchar(code)) #* See the NOTE Below.
      }
      return(code)
    }
  }
  code <- sapply(code,f1,align=align)
  return(code)
} #************************************************* end of html.align


NOTE:  This is the problem.  When the alignment setting is changed, the
length of the code string changes, and the elements in 'mark' have to be
adjusted accordingly.  Can anyone think of a way to vectorize this
process while adjusting for these changes, or is the loop the best
solution?

Here's a little something to run it on so you can see how it works.

#********************************* Start Example code.
head <- html.file.start("Test File")
text1 <- html.text("I need to write a paragraph so that I can test the 
         html.align function in my package.  I need to change the
function 
         so that it no longer uses the for loop that it now contains.
         <br/> <br/> Now a short sentence.")
text2 <- html.text("A second element")

text <- c(text1,text2)

text <- html.align(text,"center") # options are "left", "right",
                                  # "center", or "justify"

write(head,"test.html")
write(text,"test.html")



Thanks for any help.
Benjamin


P Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals
in America by U.S. News & World Report (2008).  
Visit us online at http://www.clevelandclinic.org for
a complete listing of our services, staff and
locations.


Confidentiality Note:  This message is intended for use\...{{dropped:13}}



More information about the R-help mailing list