Hi,
I am trying to do some structural topic modelling. I have some data from Twitter and I want to run two STM models, one using the date as the meta data and the other using the 'sentiment' as the meta data (where a sentiment analysis identified the Tweet as positive or negative). For the stm using created at as the metadata, I have run the following script:
processed <- textProcessor(clean_data2$text, metadata = clean_data2)
out <- prepDocuments(processed$documents, processed$vocab, processed$meta)
docs <- out$documents
vocab <- out$vocab
meta <- out$meta
Stm <- stm(docs, vocab, 10, prevalence = ~date, data = meta)
prep <- estimateEffect(1:10 ~ date, stmobj = Stm, metadata = meta,
uncertainty = "Global", prior = 1e-05)
plot(x = prep, covariate = "date", method = "continuous", topics = 1:10)
However, I don't think I have got the formula right, because the graph (see below) gives my 'expected topic prevalence' which is not what I want - I want the actual topic prevalence across time. Also, in order to make this work I need to change the date to a numeric format which essentially makes it meaningless (again, see graph below, I have also included how I did this in the script below). Not sure if I am going wrong at the model stage, or the graph stage. Any help would be appreciated
clean_data2 <- clean_data2 %>% sample_n(500) %>% mutate(date = as.Date(created_at)) %>%
mutate(date = as.numeric(date))
Created on 2022-07-13 by the reprex package (v2.0.1)