How can I find out why that is? I definitely have a topic 5 in my model and I can see the words associated with this topic when I use the topwords function.
To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:
I'm not sure how you've managed it but despite the message indicating that what you shared has come from the use of reprex v2.0.1, what you have shared is not reproducible.
This is because the first line relies on df which is private to you.
you redacted your last post, but I was able to see enough of it to recover something I could make a reprex out of, it does give a result for Topic 5.
first_column <- c("2022-05-31T22:03:15.000Z", "2022-05-31T21:18:46.000Z",
"2022-05-31T20:57:38.000Z", "2022-05-31T18:39:54.000Z", "2022-05-31T18:21:03.000Z")
second_column <- c("1.53176E+18", "1.53175E+18", "1.53174E+18",
"1.53171E+18", "1.5317E+18")
third_column <- c("While neighbourhoods in Oxford are made of dead end streetsJust look at a map of BBLIts a massive LTNBins get collected no issue",
"People making short journeys by car are exactly why LTNs are needed in all residential areas",
"This evening I attended the Fox Lane Residents meeting with ward colleagues Many residents voiced their anger over the LTNs and its ramifications in the local communityMore pollutionmore traffic and more misery 12",
"On the pavementon double yellow lines over a cycle LaneFull house for thisHGV",
"Lime tree flowers in bud todayalongside footcycle path at Via Ravenna mid1980sbuilt highwayLooking forward to our ChiTrees project to understand better how we benefit from these highway trees")
df <- data.frame(first_column, second_column, text=third_column)
library(stm)
library(quanteda)
library(tm)
library(tidyverse)
processed <- textProcessor(df$text, metadata = df)
out <- prepDocuments(processed$documents, processed$vocab, processed$meta)
docs <- out$documents
vocab <- out$vocab
meta <- out$meta
tokens <- df$text %>% tokens(what = "word", remove_punct = TRUE,
remove_numbers = TRUE, remove_url = TRUE) %>% tokens_tolower() %>%
tokens_remove(stopwords("english"))
dfm <- dfm_trim(dfm(tokens), min_docfreq = 0.001, max_docfreq = 0.99,
docfreq_type = "prop", verbose = TRUE)
ldacorpus <- Corpus(VectorSource(tokens))
dfm_stm <- convert(dfm, to = "stm")
model <- stm(documents = dfm_stm$documents, vocab = dfm_stm$vocab,
data = meta, K = 8, verbose = TRUE)
Topic5 <- findThoughts(model,df$text, topics = 5, n = 5)
Topic5
Yes, sorry I thought I had made an error I was correcting. Not sure why it is pulling something for topic 5 when I input the data like this, but not when I use my full data file