findThoughts function

Hi,

I have done topic modelling and I am trying to get a few examples of text for each topic.

Topic5 <- findThoughts(model, out$text, topics = 5, n = 5)

When I then use the summary function to see what is in Topic 5

summary(Topic5)

This doesn't give me the text for topic 5, instead I get the following:

Length Class Mode
index 1 -none- list

Any ideas?

there is no summary defined for findthoughts
just type Topic5 to see what is in Topic5

Hi -
I just get

Topic 5:

in the output

it seems that findThoughts did not return any text results, as if it had they would have been presented to you.

How can I find out why that is? I definitely have a topic 5 in my model and I can see the words associated with this topic when I use the topwords function.

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

processed <- textProcessor(df$text, metadata = df)
out <- prepDocuments(processed$documents, processed$vocab, processed$meta)
docs <- out$documents
vocab <- out$vocab
meta <- out$meta

tokens <- df$text %>% tokens(what = "word", remove_punct = TRUE, 
    remove_numbers = TRUE, remove_url = TRUE) %>% tokens_tolower() %>% 
    tokens_remove(stopwords("english"))

dfm <- dfm_trim(dfm(tokens), min_docfreq = 0.001, max_docfreq = 0.99, 
    docfreq_type = "prop", verbose = TRUE)

ldacorpus <- Corpus(VectorSource(tokens))

dfm_stm <- convert(dfm, to = "stm")

model <- stm(documents = dfm_stm$documents, vocab = dfm_stm$vocab, 
    data = meta, K = 8, verbose = TRUE)

Created on 2022-07-19 by the reprex package (v2.0.1)

I'm not sure how you've managed it but despite the message indicating that what you shared has come from the use of reprex v2.0.1, what you have shared is not reproducible.

This is because the first line relies on df which is private to you.

you redacted your last post, but I was able to see enough of it to recover something I could make a reprex out of, it does give a result for Topic 5.

first_column <- c("2022-05-31T22:03:15.000Z", "2022-05-31T21:18:46.000Z", 
                  "2022-05-31T20:57:38.000Z", "2022-05-31T18:39:54.000Z", "2022-05-31T18:21:03.000Z")
second_column <- c("1.53176E+18", "1.53175E+18", "1.53174E+18", 
                   "1.53171E+18", "1.5317E+18")
third_column <- c("While neighbourhoods in Oxford are made of dead end streetsJust look at a map of BBLIts a massive LTNBins get collected no issue", 
                  "People making short journeys by car are exactly why LTNs are needed in all residential areas", 
                  "This evening I attended the Fox Lane Residents meeting with ward colleagues Many residents voiced their anger over the LTNs and its ramifications in the local communityMore pollutionmore traffic and more misery 12", 
                  "On the pavementon double yellow lines over a cycle LaneFull house for thisHGV", 
                  "Lime tree flowers in bud todayalongside footcycle path at Via Ravenna mid1980sbuilt highwayLooking forward to our ChiTrees project to understand better how we benefit from these highway trees")

df <- data.frame(first_column, second_column, text=third_column)
library(stm)
library(quanteda)
library(tm)
library(tidyverse)
processed <- textProcessor(df$text, metadata = df)
out <- prepDocuments(processed$documents, processed$vocab, processed$meta)
docs <- out$documents
vocab <- out$vocab
meta <- out$meta

tokens <- df$text %>% tokens(what = "word", remove_punct = TRUE, 
                             remove_numbers = TRUE, remove_url = TRUE) %>% tokens_tolower() %>% 
  tokens_remove(stopwords("english"))

dfm <- dfm_trim(dfm(tokens), min_docfreq = 0.001, max_docfreq = 0.99, 
                docfreq_type = "prop", verbose = TRUE)

ldacorpus <- Corpus(VectorSource(tokens))

dfm_stm <- convert(dfm, to = "stm")

model <- stm(documents = dfm_stm$documents, vocab = dfm_stm$vocab, 
             data = meta, K = 8, verbose = TRUE)

Topic5 <- findThoughts(model,df$text, topics = 5, n = 5)
Topic5

Yes, sorry I thought I had made an error I was correcting. Not sure why it is pulling something for topic 5 when I input the data like this, but not when I use my full data file :unamused:

Is there an alternative to the findThoughts function that will pull the data I want through?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.