[R] Extending sparklyr

Javier Luraschi javier at rstudio.com
Mon Oct 10 17:29:41 CEST 2016


For versions 1.6.1 and 2.0.0 of Spark, the GaussianMixture is under the ml
namespace not mllib, try this instead:

envir$model <- "org.apache.spark.mllib.clustering.GaussianMixture"

Best, Javier

On Sun, Oct 9, 2016 at 1:47 PM, Axel Urbiz <axel.urbiz at gmail.com> wrote:

> Hi All,
>
> Just started to experiment with "sparklyr" and already loving it.
>
> I'm trying to build an extension by constructing an R wrapper to Spark's
> Gaussian Mixtures. My attempt is below, and so is the error message. Not
> sure if this is possible to do, and if so, what is wrong with my code.
>
> Any hints would be much appreciated.
>
> Best,
> Axel.
>
> -----
>
> library(sparklyr)
> library(dplyr)
> sc <- spark_connect(master = "local")
>
> x <- copy_to(sc, iris)
> x <- x %>% select(Petal_Width, Petal_Length)
>
> # set params
> k <- 3
> iter.max <- 100
> features <- dplyr::tbl_vars(x)
> compute.cost <- TRUE
> tolerance <- 1e-4
> ml.options <- ml_options()
>
> df <- spark_dataframe(x)
> sc <- spark_connection(df)
> df <- ml_prepare_features(
>   x = df,
>   features = features,
>   envir = environment()
>   # ml.options = ml.options
> )
> envir <- new.env(parent = emptyenv())
> envir$id <- ml.options$id.column
> df <- df %>%
>   sdf_with_unique_id(envir$id) %>%
>   spark_dataframe()
> tdf <- ml_prepare_dataframe(df, features, ml.options = ml.options, envir =
> envir)
> envir$model <- "org.apache.spark.ml.clustering.GaussianMixture"
> gmm <- invoke_new(sc, envir$model)
> >Error: failed to invoke spark command
> >16/10/09 16:35:35 ERROR <init> on org.apache.spark.ml.clustering.GaussianMixture
> failed
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list