Hello everyone, I have to write a term paper for university and I want to analyze the topics of some political Tweets. In addition, I have to look at the distribution of the topics over time. In order to do that, my professor gave me this code:
modeleffect<- estimateEffect(formula=1:20~user_username + s(day), stmobj = first_model, metadata = Twitter_Daten_cleaned)
When I run this code I get this error:
Error in qr.lm(thetasims[, k], qx) : **
** number of covariate observations does not match number of docs
I post my previous steps for more background information:
Twitter_Daten_cleaned$day <- as.numeric(Twitter_Daten_cleaned$created_at) / 86400
TM_id <- rowid_to_column(Twitter_Daten_cleaned, "id")
# Data Preprocessing
TM_corpus <- corpus(TM_id, docid_field = "id", text_field = "sourcetweet_text")
TM_tokens <- tokens(TM_corpus,
remove_punct = TRUE,
remove_numbers = TRUE,
remove_symbols = TRUE,
remove_url = FALSE) %>%
tokens_tolower() %>%
tokens_remove(stopwords('german')) %>%
tokens_remove(c("amp", "dass", "heute")) %>%
tokens_wordstem() # Stammformen der Wörter erhalten
TM_dfm <- dfm(TM_tokens)
#trim to run smooth
t_dfm <- dfm_trim(TM_dfm,
max_docfreq = 0.50,
min_docfreq = 0.01,
docfreq_type = 'prop')
#quanted dfm to stm format
stm_dfm <- convert(t_dfm, to = "stm")
str(stm_dfm, max.level = 1)
#compute first model
first_model <- stm(documents = stm_dfm$documents,
vocab = stm_dfm$vocab,
K = 20)
#plot first model
plot(first_model)
#list for topic lables
terms <- labelTopics(first_model)
#table topic probability
library(reshape2)
doc_probs <- tidy(first_model, matrix = "gamma", document_names = stm_dfm$meta$title)
top_terms <- tibble(topic = terms$topicnums,
prob = apply(terms$prob, 1, paste, collapse = ", "),
frex = apply(terms$frex, 1, paste, collapse = ", "))
gamma_by_topic <- doc_probs %>%
group_by(topic) %>%
summarise(gamma = mean(gamma)) %>%
arrange(desc(gamma)) %>%
left_join(top_terms, by = "topic") %>%
mutate(topic = paste0("Topic ", topic),
topic = reorder(topic, gamma))
#plot
gamma_by_topic %>%
ggplot(aes(topic, gamma, label = frex, fill = topic)) +
geom_col(show.legend = FALSE) +
geom_text(hjust = 0, nudge_y = 0.0005, size = 3) +
coord_flip() + scale_y_continuous(expand = c(0, 0), limits = c(0, 0.11), labels = scales::percent) +
theme(panel.grid.minor = element_blank(),
panel.grid.major = element_blank()) +
labs(x = NULL, y = expression(gamma))
Does anyone have experience with estimateEffect()? I would be very grateful for any help!