[R] R Production Performance

Joe Conway mail at joeconway.com
Wed Sep 24 06:47:42 CEST 2003


Paul Meagher wrote:
> Below is the test I ran awhile back on invoking R as a system call.  It
> might be faster if you had a c-extension to R but before I went that route I
> would want to know 1) roughly how fast Python and Perl are in returning
> results with their c-bindings/embedded stuff/dcom stuff, 2) whether R can be
> run as a daemon process so you don't incur start up costs, and 3) whether R
> can act as a math server in the sense that it will fork children or threads
> as multiple users establish sessions with it.  I agree it would be nice to
> have a better interface to R than via a system call.
> 

I'm doing something similar using PL/R (an R procedural language handler 
extension to Postgres that I wrote) with Postgres, R, and PHP. In 
Postgres 7.4 (currently at beta3) or with a back-patched copy of 7.3, 
you can preload the R interpreter when the Postgres postmaster first 
starts. This means that essentially R is running as part of the Postgres 
daemon. Whenever a connection is made to the database, the forked 
process already has an initialized copy of R running inside it. The 
startup savings I see are similar to what you did (2.2 seconds versus 
0.009 seconds):

------------------------------------------------------------------
Function -- intentionally very simple:
--------------------------------------
create or replace function echo(text) returns text as 'print(arg1)' 
language 'plr';

Without preloading (first function call):
-----------------------------------------
regression=# explain analyze select echo('hello');
  Total runtime: 2195.35 msec

Without preloading (second function call):
-----------------------------------------
regression=# explain analyze select echo('hello');
  Total runtime: 0.55 msec

With preloading (first function call):
-----------------------------------------
regression=# explain analyze select echo('hello');
  Total runtime: 9.74 msec

With preloading (second function call):
-----------------------------------------
regression=# explain analyze select echo('hello');
  Total runtime: 0.59 msec
------------------------------------------------------------------


In both cases the second (and subsequent) function calls are even faster 
because the PL/R function itself has been precompiled and cached.

I call the PL/R function from PHP to read my data directly from the 
database, process it, and generate whatever charts I need. Here's a very 
simple example:


The PL/R function:
------------------------------------------------------------------
create type histtup as
(
   break float8,
   count int
);

create or replace function hist(text, text)
returns setof histtup as '
  sql <- paste("select id_val from sample_numeric_data ",
               "where ia_id=''", arg1, "''", sep="")
  rs <- pg.spi.exec(sql)

  if (!is.na(arg2)) {
     x11(display=":5")
     jpeg(file=arg2, width = 480, height = 480,
          pointsize = 12, quality = 75)
     par(ask = FALSE, bg = "#F8F8F8")
     sql <- paste("select ia_attname as val from atts ",
                  "where ia_id=''", arg1, "''", sep="")
     attname <- pg.spi.exec(sql)
     h <- hist(rs[,1], col = "blue",
               main = paste("Histogram of", attname$val),
               xlab = attname$val);
     dev.off()
     system(paste("chmod 666 ", arg2, sep=""),
            intern = FALSE, ignore.stderr = TRUE)
   }
   else
     h <- hist(rs[,1], plot = FALSE);

   result = data.frame(breaks = h$breaks[1:length(h$breaks)-1],
            count = h$counts);

   return(result)
' language 'plr';
------------------------------------------------------------------

The PHP page:
------------------------------------------------------------------
<HTML><BODY>
<?PHP
echo "
<FORM ACTION='$PHP_SELF' METHOD='post' NAME='proto_form'>
<TABLE WIDTH='482' CELLSPACING='0' CELLPADDING='1' BORDER='0'>
   <TR>
     <TD>Data</TD>
     <TD><INPUT TYPE='text' NAME='userdata' value='' size='80'></TD>
   </TR>
   <TR>
     <TD colspan='2'>
       <INPUT TYPE='submit' NAME='submit' value='Submit'>
     </TD>
   </TR>
</TABLE>
</FORM>
";

if ($_POST['submit'] == "Submit")
{
   $tmpfilename = 'charts/hist1.jpg';
   $conn = pg_connect("dbname=oscon user=postgres");
   $sql = "select * from hist('" . $_POST['userdata'] . "','" .
          "/tmp/" . $tmpfilename . "')";
   $rs = pg_query($conn,$sql);
   echo "<img src='$tmpfilename' border=0>";
}
?>
</BODY></HTML>
------------------------------------------------------------------


Hopefully this gives you some ideas about what is possible. If you're 
interested in PL/R, you can grab a copy (along with a patched 7.3.4 
source RPM for Postgres) here: http://www.joeconway.com/

HTH,

Joe




More information about the R-help mailing list