Hi, I am going through the Topic Modeling chapter of the Tidy Text Modeling book, but trying LDA using @Emilhvitfeldt 's textrecipes package.
I could be misunderstanding, but it seems like prepping and juicing a recipe that includes step_lda
by default only produces per-document per-topic probabilities. How can I extract the beta probabilities as well to analyze the topics themselves?
Here's an example of what I was doing:
devtools::install_github("EmilHvitfeldt/scotus")
library(scotus)
scotus_lda_rec <- recipe(~ ., data = scotus_sample) %>%
step_lda(text)
set.seed(123)
scotus_lda_prep <- prep(scotus_lda_rec)
scotus_lda <- juice(scotus_lda_prep)
Then to get the top topic per document I'd do something like this:
scotus_lda2 <- scotus_lda %>%
pivot_longer(lda_text_w1:lda_text_w10) %>%
group_by(id) %>%
top_n(1, value) %>%
select(id, top_topic = name) %>%
left_join(scotus_lda) %>%
left_join(scotus_sample %>% select(id, text))
But it'd also be great to get the top terms per topic -- any help is appreciated!