I love the embed package. It lead me to a question that I'm having a hard time answering on my own---are Pearson correlations between likelihood encoding variables meaningful? Are they meaningful in the sense that they can tell me something about the relationships between transformed factors and the target variable? Or is this not a good way to proceed?
data(iris) library(tidymodels) library(embed) recipe(Sepal.Length~.,data=iris) %>% step_lencode_glm(Species,outcome = vars(Sepal.Length)) %>% prep() %>% bake(new_data=NULL) %>% cor()
For instance, the correlation of the species term to other variables in the example above. Is it meaningful?