[Rd] How to efficiently share data (a dataframe) between R and Java

Ing. Jaroslav Kuchař jaroslav.kuchar at fit.cvut.cz
Thu Dec 17 07:53:56 CET 2015


Thank you for the example. Based on my recent experiments, this solution
is the most efficient.

Cheers,
Jaroslav

On 2015-12-15 23:15, Simon Urbanek wrote:
> You can pass the entire df, example:
> 
>> data(iris)
>> iris$sp = as.character(iris$Species)
>> o=.jarray(lapply(iris, .jarray))
>> .jcall("C",,"df",o)
> df, 6 variables
> [0]: double[150]
> [1]: double[150]
> [2]: double[150]
> [3]: double[150]
> [4]: int[150]
> [5]: String[150]
> 
> 
> Java code:
> 
> public class C {
>        static void df(Object df[]) {
>        	      int n;
>        	      System.out.println("df, " + (n = df.length) + " variables");
> 	      int i = 0;
> 	      while (i < n) {
> 	      	    if (df[i] instanceof double[]) {
> 		    	double d[] = (double[]) df[i];
> 		        System.out.println("["+i+"]: double["+d.length+"]");
> 		    } else if (df[i] instanceof int[]) {
> 		    	int d[] = (int[]) df[i];
> 		        System.out.println("["+i+"]: int["+d.length+"]");
> 		    } else if (df[i] instanceof String[]) {
> 		        String s[] = (String[]) df[i];
> 			System.out.println("["+i+"]: String["+s.length+"]");
> 		    } else {
> 		        System.out.println("["+i+"]: some other type...");
> 		    }
> 		    i++;
> 	      }
>         }
> }
> 
> Normally, you wouldn't pass the entire df but instead have methods for
> the types you care about as the modeling function - that's more
> Java-like approach, but either is valid and there is no difference in
> efficiency.
> 
> Cheers,
> Simon
> 
> 
> 
>> On Dec 15, 2015, at 12:50 PM, Ing. Jaroslav Kuchař <jaroslav.kuchar at fit.cvut.cz> wrote:
>>
>> Dear all,
>>
>> thank you for your hints. I would prefer to do not use Rserve as Dirk
>> mentioned.
>>
>> @Simon
>> I have full control over the Java implementation - I can adapt the code
>> that I use for the communication R <-> Java.
>>
>>> You can natively access structures on each side. The fastest way is to
>>> use R representation (column-oriented) in Java - that is much faster
>>> than any kind of serialization or anything you mention above since you
>>> pass the variables as a whole.
>>
>> Could you please send any reference to more examples or documentation
>> that can help me?
>> The main goal is to copy a full dataframe from R to Java.
>>
>> Best regards,
>> Jaroslav
>>
>> On 2015-12-07 03:19, Simon Urbanek wrote:
>>> On Dec 6, 2015, at 12:36 PM, Ing. Jaroslav Kuchař
>>> <jaroslav.kuchar at fit.cvut.cz> wrote:
>>>
>>>> Dear all,
>>>>
>>>> in our ongoing project we use Java implementations of several
>>>> algorithms. We also provide a “wrapper” implemented as an R package
>>>> using rJava (https://github.com/jaroslav-kuchar/rCBA). Based on our
>>>> recent experiments, the significant portion of time is spent on copying
>>>> a dataframe from R to Java. The Java implementation needs access to the
>>>> source dataframe.
>>>>
>>>> I have tested several approaches: calling Java method row-by-row;
>>>> serialize the whole data-frame to a temp file and parsing in Java; or
>>>> row binding to a single vector and calling a single Java method. Each
>>>> approach has its limitations e.g. time-consuming row-by-row copying,
>>>> serialization and parsing performance or memory limitations of a single
>>>> vector.
>>>>
>>>> Is there an efficient approach how to copy a dataframe from R to Java
>>>> and another one from Java to R?
>>>>
>>>> Thanks for any help you can provide...
>>>>
>>>
>>> You can natively access structures on each side. The fastest way is to
>>> use R representation (column-oriented) in Java - that is much faster
>>> than any kind of serialization or anything you mention above since you
>>> pass the variables as a whole.
>>>
>>> Typically, the bottleneck are Java applications which may require very
>>> inefficient data structures. If you have control over the algorithms,
>>> you can simply use proper data structures and avoid that problem. If
>>> you don't have control, you'll have to add Java code that converts to
>>> whatever structure is needed by the Java code form the data frame
>>> pushed to the Java side. The main point here is that you do NOT want
>>> to do any conversion on the R side.
>>>
>>> Cheers,
>>> Šimon
>>



More information about the R-devel mailing list