step_lda per-topic-per-word probability extraction

Hi, I am going through the Topic Modeling chapter of the Tidy Text Modeling book, but trying LDA using @Emilhvitfeldt 's textrecipes package.

I could be misunderstanding, but it seems like prepping and juicing a recipe that includes step_lda by default only produces per-document per-topic probabilities. How can I extract the beta probabilities as well to analyze the topics themselves?

Here's an example of what I was doing:


scotus_lda_rec <- recipe(~ ., data = scotus_sample) %>%

scotus_lda_prep <- prep(scotus_lda_rec)
scotus_lda <- juice(scotus_lda_prep)

Then to get the top topic per document I'd do something like this:

scotus_lda2 <- scotus_lda %>%
    pivot_longer(lda_text_w1:lda_text_w10) %>%
    group_by(id) %>%
    top_n(1, value) %>%
    select(id, top_topic = name) %>%
    left_join(scotus_lda) %>%
    left_join(scotus_sample %>% select(id, text))

But it'd also be great to get the top terms per topic -- any help is appreciated!

Hello @cgpeltier,

This is not possible to do in {textrecipes} right now. I'll take a look at this over the weekend to see if I can add this as a feature :smile:

Great, thank you! And textrecipes is great so far, thanks for all of your work on it (and smltar)!

