ml_gradient_boosted_trees and ml_decision_tree don't work in an ml_pipeline

subie · October 23, 2019, 7:04pm

Just a heads up for others trying to build decision tree ml_pipeline with sparklyr:
ml_gradient_boosted_trees and ml_decision_tree don't work; they will throw the error:

Error: Unable to retrieve a Spark DataFrame from object of class ml_pipeline ml_estimator ml_pipeline_stage.

In the code for these two functions you'll see that the function spark_dataframe tries to access the spark table but the ml_pipeline actually has the following classes:
"ml_pipeline" "ml_estimator" "ml_pipeline_stage"

It seems like that spark_dataframe function is trying to access an object of class:
"tbl_spark" "tbl_sql" "tbl_lazy" "tbl"

That said, the solution that I found was using the tree functions one level lower. Specifically, ml_decision_tree_classifier()
ml_gbt_classifier()

Hopefully, this saves you a few hours!

system · November 13, 2019, 7:04pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.