Sparklyr | Multiclass Logistic Regression issue

I was trying to fit multiclass logistic regression model in sparklyr. The dataset used “mtcars” , the target variable which we were trying to predict is ‘gear’, which has 3 class,method “ml_logistic_regression” threw following error “ Error: java.lang.IllegalArgumentException: invalid method areaUnderROC for object 281

Below is the code

Unloading Libraries

unloadNamespace("RevoScaleR")
unloadNamespace("sparklyr")
unloadNamespace("CompatibilityAPI")
unloadNamespace("dplyr")
unloadNamespace("httr")
unloadNamespace("shiny")
unloadNamespace("promises")
unloadNamespace("httpuv")
unloadNamespace("R6")
unloadNamespace("h2o")
unloadNamespace("jsonlite")
unloadNamespace("DBI")

library(sparklyr)
library(dplyr)
library(DBI)

Defining configurations for Spark Context

config <- spark_config()
config$spark.driver.cores <- 4
config$spark.executor.cores <- 4
config$spark.executor.memory <- "20G"
config$spark.yarn.queue <- "root.default"
config$sparklyr.gateway.port <- 1800
spark_home <- "/opt/cloudera/"
spark_version <- "2.1.0"

Setting up Java Home

Sys.setenv(JAVA_HOME='/usr/java/jdk1.8.0_171')

Setting up Spark context

sc <- spark_connect(master="yarn-client", version=spark_version, config=config, spark_home=spark_home)

Converting mtcars dataset to spark dataframe

mtcars_tbl <- sdf_copy_to(sc, mtcars, name = "mtcars_tbl", overwrite = TRUE)

Fitting Binary Logistic Regression. 'am' is a binary variable

lr_model <- mtcars_tbl %>% ml_logistic_regression(am ~ gear + carb)

Finding: The code gets executed successfully

Fitting Multiclass Logistic Regression. 'gear' has 3 unique values

lr_model <- mtcars_tbl %>% ml_logistic_regression(gear ~ am + carb, family='multinomial')

Finding: Throws an error

An ROC curve is applicable to a two-class model, but your model has a multi-class outcome. I haven't used sparklyr for multinomial regression, but the error message implies that this might be the reason for the error.

There are methods for using ROC curves with multi-class models (e.g., several one-versus-all ROC curves (see, here for example)), but I'm not sure if sparklyr has that option. There are also some R packages that can calculate and plot ROC surfaces for three-way classification.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.