Hi,
Reading it again, I'm sorry, my question wasn't formulated properly.
What I want to do is an inner join with a previously computed pipeline, to avoid computing it twice.
for example, that's my pipeline, a bit more complicated:
dplyr_pipeline <- data_1 %>%
inner_join(
data_1 %>% group_by(bob) %>% summarise(mean(whatever))
) %>%
inner_join(data_2) # new data set
ml_pipeline <- ml_pipeline(sc) %>%
ft_dplyr_transformer(dplyr_pipeline) %>%
ml_random_forest_regressor()
Does this make sense? (I hope
)
Thank you
Richard