Just a heads up for others trying to build decision tree ml_pipeline with sparklyr:
ml_gradient_boosted_trees and ml_decision_tree don't work; they will throw the error:
Error: Unable to retrieve a Spark DataFrame from object of class ml_pipeline ml_estimator ml_pipeline_stage.
In the code for these two functions you'll see that the function spark_dataframe tries to access the spark table but the ml_pipeline actually has the following classes:
"ml_pipeline" "ml_estimator" "ml_pipeline_stage"
It seems like that spark_dataframe function is trying to access an object of class:
"tbl_spark" "tbl_sql" "tbl_lazy" "tbl"
That said, the solution that I found was using the tree functions one level lower. Specifically, ml_decision_tree_classifier()
ml_gbt_classifier()
Hopefully, this saves you a few hours!