step_lda per-topic-per-word probability extraction

Hi, I am going through the Topic Modeling chapter of the Tidy Text Modeling book, but trying LDA using @Emilhvitfeldt 's textrecipes package.

I could be misunderstanding, but it seems like prepping and juicing a recipe that includes step_lda by default only produces per-document per-topic probabilities. How can I extract the beta probabilities as well to analyze the topics themselves?

Here's an example of what I was doing:

devtools::install_github("EmilHvitfeldt/scotus")
library(scotus)

scotus_lda_rec <- recipe(~ ., data = scotus_sample) %>%
    step_lda(text)

set.seed(123)
scotus_lda_prep <- prep(scotus_lda_rec)
scotus_lda <- juice(scotus_lda_prep)

Then to get the top topic per document I'd do something like this:

scotus_lda2 <- scotus_lda %>%
    pivot_longer(lda_text_w1:lda_text_w10) %>%
    group_by(id) %>%
    top_n(1, value) %>%
    select(id, top_topic = name) %>%
    left_join(scotus_lda) %>%
    left_join(scotus_sample %>% select(id, text))

But it'd also be great to get the top terms per topic -- any help is appreciated!

Hello @cgpeltier,

This is not possible to do in {textrecipes} right now. I'll take a look at this over the weekend to see if I can add this as a feature :smile:

Great, thank you! And textrecipes is great so far, thanks for all of your work on it (and smltar)!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.