[Rd] R in the browser ...

Jony Hudson jony.hudson at imperial.ac.uk
Sat May 25 23:58:28 CEST 2013


Hi all,

 I hope you'll forgive me - I don't plan to start using this list as my blog - but given the discussion following my last post I thought people on here might be interested to see some progress. This is a minimal build of R, cross-compiled from C/Fortran to javascript with emscripten - to be clear, nothing is running server-side, this is all running in the browser's JS engine. The user experience is rather lacking at the minute, much is missing (see below), and there are no compiler optimisations applied (also below) but still, it kind of works. Have a play here:

http://r-in-the-browser.herokuapp.com/

(WARNING: 20MB HTML file, 3.5MB gzipped  + 7MB data file). It goes without saying that you'll want to use a modern web browser to look at it! It works in the latest chrome, firefox, and safari (although I can't see the session output until after I q() in my safari :-() It also seems to run on my iPad, after a very long start-up wait, but it shares the problem of desktop safari that you have to work blind. It also runs on my Android phone, although it takes a _very_ long time to start up - something like 15 minutes!
Picture of a session running on an iPad: http://imgur.com/jzYL5wf
Picture of a session running on a Nexus 4 phone: http://imgur.com/bjDA95j

So far, it's only the core of R - the only package it is loading is "base". The other default packages build mostly, I think, but I haven't yet figured out how to patch the dynamic loading to pull the "native" code in at the right time. Also, nothing using LAPACK works as R is trying to dynamically load that too.

Getting it to build was fiddly, but that probably has a lot to do with my lack of experience building anything unix-y, and this being the first time I've used autotools (I read in the manual that their aim was not to make it user-friendly for maintainers, but rather for users. Well, chapeau to them as they've certainly achieved their aims). In rough outline (and from memory):
- I used f2c to convert the Fortran sources in appl, main and extra/blas to c, and modified the Makefile.in's accordingly - also adding instruction to link in a version of libf2c that I'd pre-compiled to LLVM bitcode.
- I had to manually hack on the configure.h file to remove references to some functions that don't seem to exist in emscripten: ccosh, cexp etc.
- I also added code to set SSIZE_MAX and R_XLEN_T_MAX to sensible values (thanks Peter).
- I hacked connections.c as there were some duplicate case statements. I think this was because the ./configure was getting confused over the size of some basic types. I suspect all of above hacks could be avoided if I understood autotools better.
- I tweaked configure.h again by hand to disable ARPA_INET. I think there's a problem in emscripten's inet header files, but am not sure yet.
- I had to hack xdr_mem.c to force it to use an appropriate ntohl and htonl. Again, probably an autotools problem that I'm not understanding.
- This was enough to run make and have it build LLVM-bitcode for the main source tree (except, bizarrely mkdtemp.c which I had to compile by hand). The make errors out before the end, but it gets far enough. I could then link all of the generated bitcode together and convert to JS. The "virtual filesystem" for the JS code was populated with the contents of /usr/local/lib/R which I trimmed down a bit to get rid of stuff that wasn't going to work.
- The emscripten libc implementation is incomplete so I had to stub out some functions that are probably quite important - glob() and globfree(). That really needs to be fixed! I also stubbed out __locale_mb_cur_max to return some value or other, as I wasn't sure where it was supposed to come from, and I was getting tired.

At the moment all compiler optimisations are turned off, which makes a big difference to code size and performance. The problem is that it appears that some of the R code uses unsafe function pointer casting, which causes trouble with the way that emscripten optimises code (I think it's that it depends on each function having a well-defined type that the JS interpreter can be sure of). I haven't looked into where these casts are, and how difficult it would be to make them safe, but hopefully they're not too pervasive. 

Anyway, it's just a start, but I'm pretty pleased with it :-)


Jony

--
Centre for Cold Matter, The Blackett Laboratory,
Imperial College London, London SW7 2BW
T: +44 (0)207 5947741
http://www.imperial.ac.uk/people/jony.hudson
http://www.imperial.ac.uk/ccm/research/edm
http://www.monkeycruncher.org
http://j-star.org/
--



More information about the R-devel mailing list