[Rd] JDataFrame API

Simon Urbanek simon.urbanek at r-project.org
Fri Jan 15 16:58:14 CET 2016


Tom,

this may be good for embedding small data sets, but for practical purposes is doesn't seem like the most efficient solution.

Since you didn't provide any code, I built a test case using the build-in Java JSON API to build a medium-sized dataset (1e6 rows) and read it in just to get a ballpark (see
https://gist.github.com/s-u/4efb284e3c15c6a2db16

# generate:
time java -cp .:javax.json-api-1.0.jar:javax.json-1.0.4.jar A > 1e6

real	0m2.764s
user	0m20.356s
sys	0m0.962s

# read:
> system.time(temp <- RJSONIO::fromJSON("1e6"))
   user  system elapsed 
  3.484   0.279   3.834 
> str(temp)
List of 2
 $ V1: num [1:1000000] 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 ...
 $ V2: chr [1:1000000] "X0" "X1" "X2" "X3" ...

For comparison using Java directly (includes both generation and reading into R):

> system.time(temp <- lapply(J("A")$direct(), .jevalArray))
   user  system elapsed 
  0.962   0.186   0.494 

So the JSON route is very roughly ~13x slower than using Java directly. Obviously, this will vary by data set type etc. since there is R overhead involved as well: for example, if you have only numeric variables, the JSON route is 30x slower on reading alone [50x total]. String variables slow down everyone equally. Interestingly, the JSON encoding is using all 16 cores, so the 2.7s real time add up to over 20s CPU time so on smaller machines you may see more overhead. 

If you need process separation, it may be a different story - in principle it is faster to use more native serialization than JSON since parsing is the slowest part for big datasets.

Cheers,
Simon


> On Jan 14, 2016, at 4:52 PM, Thomas Fuller <thomas.fuller at coherentlogic.com> wrote:
> 
> Hi Folks,
> 
> If you need to send data from Java to R you may consider using the
> JDataFrame API -- which is used to convert data into JSON which then
> can be converted into a data frame in R.
> 
> Here's the project page:
> 
> https://coherentlogic.com/middleware-development/jdataframe/
> 
> and here's a partial example which demonstrates what the API looks like:
> 
> String result = new JDataFrameBuilder()
>    .addColumn("Code", new Object[] {"WV", "VA", })
>    .addColumn("Description", new Object[] {"West Virginia", "Virginia"})
>    .toJson();
> 
> and in R script we would need to do this:
> 
> temp <- RJSONIO::fromJSON(json)
> tempDF <- as.data.frame(temp)
> 
> which yields a data frame that looks like this:
> 
>> tempDF
>    Description Code
> 1 West Virginia   WV
> 2      Virginia   VA
> 
> It is my intention to deploy this project to Maven Central this week,
> time permitting.
> 
> Questions and comments are welcomed.
> 
> Tom
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list