Code estimateEffect() does not work on Topic Modeling

Hello everyone, I have to write a term paper for university and I want to analyze the topics of some political Tweets. In addition, I have to look at the distribution of the topics over time. In order to do that, my professor gave me this code:

modeleffect<- estimateEffect(formula=1:20~user_username + s(day), stmobj = first_model, metadata = Twitter_Daten_cleaned)

When I run this code I get this error:
Error in qr.lm(thetasims[, k], qx) : **
** number of covariate observations does not match number of docs

I post my previous steps for more background information:

Twitter_Daten_cleaned$day <- as.numeric(Twitter_Daten_cleaned$created_at) / 86400

TM_id <- rowid_to_column(Twitter_Daten_cleaned, "id")

# Data Preprocessing

TM_corpus <- corpus(TM_id, docid_field = "id", text_field = "sourcetweet_text")

TM_tokens <- tokens(TM_corpus,
                        remove_punct = TRUE, 
                        remove_numbers = TRUE, 
                        remove_symbols = TRUE, 
                        remove_url = FALSE) %>% 
  tokens_tolower() %>% 
  tokens_remove(stopwords('german')) %>% 
  tokens_remove(c("amp", "dass", "heute")) %>% 
  tokens_wordstem()  # Stammformen der Wörter erhalten

TM_dfm <- dfm(TM_tokens)

#trim to run smooth

t_dfm <- dfm_trim(TM_dfm,
                  max_docfreq = 0.50, 
                  min_docfreq = 0.01, 
                  docfreq_type = 'prop')

#quanted dfm to stm format
stm_dfm <- convert(t_dfm, to = "stm")

str(stm_dfm, max.level = 1)

#compute first model 
first_model <- stm(documents = stm_dfm$documents, 
                   vocab = stm_dfm$vocab,
                   K = 20)

#plot first model
plot(first_model)

#list for topic lables 
terms <- labelTopics(first_model)

#table topic probability 
library(reshape2)
doc_probs <- tidy(first_model, matrix = "gamma", document_names = stm_dfm$meta$title)

top_terms <- tibble(topic = terms$topicnums,
                    prob = apply(terms$prob, 1, paste, collapse = ", "), 
                    frex = apply(terms$frex, 1, paste, collapse = ", "))

gamma_by_topic <- doc_probs %>%
  group_by(topic) %>% 
  summarise(gamma = mean(gamma)) %>% 
  arrange(desc(gamma)) %>% 
  left_join(top_terms, by = "topic") %>% 
  mutate(topic = paste0("Topic ", topic), 
         topic = reorder(topic, gamma))

#plot 
gamma_by_topic %>%
  ggplot(aes(topic, gamma, label = frex, fill = topic)) + 
  geom_col(show.legend = FALSE) + 
  geom_text(hjust = 0, nudge_y = 0.0005, size = 3) + 
  coord_flip() + scale_y_continuous(expand = c(0, 0), limits = c(0, 0.11), labels = scales::percent) + 
  theme(panel.grid.minor = element_blank(), 
        panel.grid.major = element_blank()) + 
  labs(x = NULL, y = expression(gamma))

Does anyone have experience with estimateEffect()? I would be very grateful for any help!

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.