[R] Extending sparklyr

Axel Urbiz axel.urbiz at gmail.com
Sun Oct 9 22:47:12 CEST 2016


Hi All,

Just started to experiment with "sparklyr" and already loving it.

I'm trying to build an extension by constructing an R wrapper to Spark's
Gaussian Mixtures. My attempt is below, and so is the error message. Not
sure if this is possible to do, and if so, what is wrong with my code.

Any hints would be much appreciated.

Best,
Axel.

-----

library(sparklyr)
library(dplyr)
sc <- spark_connect(master = "local")

x <- copy_to(sc, iris)
x <- x %>% select(Petal_Width, Petal_Length)

# set params
k <- 3
iter.max <- 100
features <- dplyr::tbl_vars(x)
compute.cost <- TRUE
tolerance <- 1e-4
ml.options <- ml_options()

df <- spark_dataframe(x)
sc <- spark_connection(df)
df <- ml_prepare_features(
  x = df,
  features = features,
  envir = environment()
  # ml.options = ml.options
)
envir <- new.env(parent = emptyenv())
envir$id <- ml.options$id.column
df <- df %>%
  sdf_with_unique_id(envir$id) %>%
  spark_dataframe()
tdf <- ml_prepare_dataframe(df, features, ml.options = ml.options, envir =
envir)
envir$model <- "org.apache.spark.ml.clustering.GaussianMixture"
gmm <- invoke_new(sc, envir$model)
>Error: failed to invoke spark command
>16/10/09 16:35:35 ERROR <init> on
org.apache.spark.ml.clustering.GaussianMixture failed

	[[alternative HTML version deleted]]



More information about the R-help mailing list