[R] Perl vs. R

Wed Jun 12 16:32:35 CEST 2002

Don,
Thanks. I'm sure I am not the only who is learning a lot from this 
benchmark problem. It's very simple but is an abstract model for virtually 
all apps: open a file, do something to it, write out a new version of it etc.
John Day

At 07:14 AM 6/12/02 -0700, you wrote:
>I believe the following 5 lines do the job (if it looks like 6 lines, the 
>email software made the for() loop line into two lines).
>
>inp <- scan('perlcomp.dat',what=list(a='',b=''),sep='\t')
>foo <- split(inp$a,inp$b)
>sink('pcmp.out')
>for (i in seq(foo)) 
>cat(names(foo)[1],'\t',paste(sort(foo[[i]]),collapse='\t'),'\n',sep='')
>sink()
>
>I tried
>   lapply(split(inp$a,inp$b),function(x) cat(names(x),sort(x),'\n'))
>instead of the for() loop, but the names(x) part doesn't pick up the 
>values of B as needed. But it doesn't matter, because the for() loop is 
>just as fast.
>
>Without the overhead of starting up R,
>
>>  system.time(source('pcomp.r'))
>Read 5642 records
>[1] 1.51 0.00 1.72 0.00 0.00
>
>On 466 mHz G4 Macintosh
>
>>  version
>          _
>platform powerpc-apple-darwin5.5
>arch     powerpc
>os       darwin5.5
>system   powerpc, darwin5.5
>status
>major    1
>minor    5.0
>year     2002
>month    04
>day      29
>language R
>
>
>-Don
>
>At 7:23 AM -0400 6/12/02, John Day wrote:
>>Prof. Bates,
>>
>>Thanks for the pointers. I ran your two-liner (the args to write.table() 
>>needed to be swapped) and noted the runtime to be about 0.9 secs in CMD 
>>BATCH mode, several times slower than the Perl. You were right.
>>
>>Actually, the code is not correct. The  specification required the 
>>benchmark code to collect the fields in A and use the 1301 unique codes 
>>in B as a key to retrieve the A's appended and sorted in a list. That 
>>might require an explicit loop, which will slow it down even more.
>>
>>But even then, for research and learning purposes, I think I could live 
>>with this sluggish performance most of the time, just to avoid having to 
>>interface with Perl. It's very convenient to do everything in R. Maybe 
>>occasionally use Perl where performance demands it etc.
>>
>>I have the new John Fox book on order. But will try to find a copy of 
>>Venables-Ripley too. I don't have S-Plus, I thought the Fox book might be 
>>better for R-only users.
>>
>>I also want to study Pinheiro-Bates, but must wait until I have grasped 
>>the basics.
>>
>>Thanks,
>>John Day
>>At 11:14 AM 6/11/02 -0500, you wrote:
>>>John Day <jday at csihq.com> writes:
>>>
>>>>  I am being told that R can process text files and strings as well as
>>>>  Perl (and is certainly more elegant).
>>>
>>>"as well as" is in the eye of the beholder.  Perl is very highly tuned
>>>to manipulating text files.  One story of how the name perl came about
>>>is as an acronym for "Practical Extraction and Report Language".
>>>
>>>R is an environment for statistical computing and graphics.  Although
>>>there are pattern matching and text substitution functions in R, it is
>>>not well suited to writing "one-off" text transformation programs.
>>>You will find that starting R probably takes longer than the execution
>>>of the perl program.
>>>
>>>Rather than trying to take a simple benchmark and see how R performs
>>>on it, it would be better to learn about the language and see if it
>>>fulfills a real need for you.  I would suggest starting with Venables
>>>and Ripley's "Modern Applied Statistics with S-PLUS (3rd ed)" or the
>>>eagerly-awaited fourth edition of that book slated for publication
>>>this summer.
>>>
>>>Having said all this, I believe your perl program can be coded in R as
>>>something like
>>>
>>>   df <- read.table('infile', header = FALSE, sep = '\t', col = c('a', 'b'))
>>>   write.table('outfile', df[order(df$b), c('b', 'a')])
>>>
>>>although I think it would be better for you to describe what the task
>>>is rather than providing perl code to accomplish the task.  I long ago
>>>gave up reading other people's perl code and trying to make sense of
>>>it.  (In the Python community there is a saying that "Hell is reading
>>>other people's Perl code".)
>>>
>>>>  Being an R neophyte I need a little boost to get started. I have a
>>>>  little benchmark program in Perl that reads a delimited file, creates
>>>>  an inverted table and spits the file out again in key sorted order.
>>>  >
>>>  >
>>>  > It's just a few lines of Perl (see below). Can someone write the
>>>  > equivalent in R? The benchmark and associated files are available
>>>  > from: http://www.lib.uchicago.edu/keith/crisis/benchmarks/invert/
>>>>
>>>>
>>>>  You'll note on this page that Perl runs the benchmark in 3.5
>>>>  secs. That was in 1997. My 5.6.1 version of Perl runs it in 0.18 secs
>>>>  now, on my 600Mhz Linux platform. Wondering how fast R will be in
>>>>  comparison.
>>>>
>>>>
>>>>  Thanks,
>>>>  John Day
>>>>
>>>>  FYI, here's the Perl source:
>>>>
>>>>  #!/local/bin/perl
>>>>  # invert benchmark in Perl
>>>>  # see <url:http://www.lib.uchicago.edu/keith/crisis/benchmarks/invert/
>>>>  # Keith Waclena <k-waclena at uchicago.edu>
>>>>
>>>>  while (<STDIN>) {
>>>>       chop;
>>>>       ($a, $b) = split(/\t/);
>>>>       $B{$b} .= "\t$a";       # gotta lose leading tab later...
>>>>  }
>>>>
>>>>  foreach $b (sort keys %B) {
>>>>       # lose the leading tab with substr...
>>>>       print "$b\t" . join("\t", sort(split(/\t/, substr($B{$b}, 1)))) 
>>>> . "\n";
>>>>  }
>>>>
>>>>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
>>>>  r-help mailing list -- Read 
>>>> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
>>>>  Send "info", "help", or "[un]subscribe"
>>>>  (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
>>>>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>>
>>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
>>r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
>>Send "info", "help", or "[un]subscribe"
>>(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
>>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>
>
>--
>--------------------------------------
>Don MacQueen
>Environmental Protection Department
>Lawrence Livermore National Laboratory
>Livermore, CA, USA
>--------------------------------------

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._