Structural Topic Modelling with Twitter data

ILCC · July 13, 2022, 4:15pm

Hi,
I am trying to do some structural topic modelling. I have some data from Twitter and I want to run two STM models, one using the date as the meta data and the other using the 'sentiment' as the meta data (where a sentiment analysis identified the Tweet as positive or negative). For the stm using created at as the metadata, I have run the following script:

processed <- textProcessor(clean_data2$text, metadata = clean_data2)
out <- prepDocuments(processed$documents, processed$vocab, processed$meta)
docs <- out$documents
vocab <- out$vocab
meta <- out$meta


Stm <- stm(docs, vocab, 10, prevalence = ~date, data = meta)

prep <- estimateEffect(1:10 ~ date, stmobj = Stm, metadata = meta, 
    uncertainty = "Global", prior = 1e-05)

plot(x = prep, covariate = "date", method = "continuous", topics = 1:10)

However, I don't think I have got the formula right, because the graph (see below) gives my 'expected topic prevalence' which is not what I want - I want the actual topic prevalence across time. Also, in order to make this work I need to change the date to a numeric format which essentially makes it meaningless (again, see graph below, I have also included how I did this in the script below). Not sure if I am going wrong at the model stage, or the graph stage. Any help would be appreciated

clean_data2 <- clean_data2 %>% sample_n(500) %>% mutate(date = as.Date(created_at)) %>% 
    mutate(date = as.numeric(date))

^{Created on 2022-07-13 by the reprex package (v2.0.1)}

system · August 3, 2022, 4:15pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.