[R] rZeppelin: Easy Spark for R Data Scientists

Amos B. Elberg amos.elberg at gmail.com
Mon Jan 4 20:26:21 CET 2016


rZeppelin is an R Interpreter for the Apache (Incubating) Zeppelin project.  

The intention of rZeppelin is to make it possible for regular R-using non-programmer to integrate the power of Spark, and the wide range of ML packages available for Python and scala, into their day-to-day toolbox — without having to learn a new language, without any learning curve beyond a review of the SparkR API, and without the budget needs or administrative overhead of setting up a Spark or hadoop infrastructure.  

Zeppelin is a notebook (like iPython) built on top of Spark.  Zeppelin provides interactive data visualization and other features, and interpreters for a wide variety of “big data” stores. 

rZeppelin makes it possible to combine R, scala, and Python code in a single data/ML pipeline, seamlessly, from a single, familiar, interface.  (And without breaking lazy evaluation!)

This means that you can use the Spark package-base of ultra-fast implementations of popular ML algorithms optimized for clusters, as well as python packages, as an extension of your existing work with R.  

For example, imagine loading text data in R, running LDA on the text using the distributed implementation of LDA in Spark’s MLLIB, tagging the text using advanced Python NLP packages such as gensim, and then visualizing and further processing the results in R — all from the same interface, in the same session. 

rZeppelin lets you do this because the R interpreter, along with Zeppelin’s scala and Python interpreters, share the same Spark backend.  

Apart from Spark, most common datatypes can be moved among R, scala, and Python through the “ZeppelinContext,” a shared environment.  

rZeppelin is integrated with Zeppelin’s interactive visualization features.  It also uses knitr for compatibility with most R data visualization and interactive visualization packages, such as ggplot2 and rCharts.  

rZeppelin is available here:  https://github.com/elbamos/Zeppelin-With-R  
	[[alternative HTML version deleted]]



More information about the R-help mailing list