findThoughts function

ILCC · July 19, 2022, 1:36pm

Hi,

I have done topic modelling and I am trying to get a few examples of text for each topic.

Topic5 <- findThoughts(model, out$text, topics = 5, n = 5)

When I then use the summary function to see what is in Topic 5

summary(Topic5)

This doesn't give me the text for topic 5, instead I get the following:

Length Class Mode
index 1 -none- list

Any ideas?

nirgrahamuk · July 19, 2022, 1:52pm

there is no summary defined for findthoughts
just type Topic5 to see what is in Topic5

ILCC · July 19, 2022, 1:56pm

Hi -
I just get

Topic 5:

in the output

nirgrahamuk · July 19, 2022, 1:57pm

it seems that findThoughts did not return any text results, as if it had they would have been presented to you.

ILCC · July 19, 2022, 1:59pm

How can I find out why that is? I definitely have a topic 5 in my model and I can see the words associated with this topic when I use the topwords function.

nirgrahamuk · July 19, 2022, 2:03pm

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

ILCC · July 19, 2022, 2:15pm

processed <- textProcessor(df$text, metadata = df)
out <- prepDocuments(processed$documents, processed$vocab, processed$meta)
docs <- out$documents
vocab <- out$vocab
meta <- out$meta

tokens <- df$text %>% tokens(what = "word", remove_punct = TRUE, 
    remove_numbers = TRUE, remove_url = TRUE) %>% tokens_tolower() %>% 
    tokens_remove(stopwords("english"))

dfm <- dfm_trim(dfm(tokens), min_docfreq = 0.001, max_docfreq = 0.99, 
    docfreq_type = "prop", verbose = TRUE)

ldacorpus <- Corpus(VectorSource(tokens))

dfm_stm <- convert(dfm, to = "stm")

model <- stm(documents = dfm_stm$documents, vocab = dfm_stm$vocab, 
    data = meta, K = 8, verbose = TRUE)

^{Created on 2022-07-19 by the reprex package (v2.0.1)}

nirgrahamuk · July 19, 2022, 2:25pm

I'm not sure how you've managed it but despite the message indicating that what you shared has come from the use of reprex v2.0.1, what you have shared is not reproducible.

This is because the first line relies on df which is private to you.

nirgrahamuk · July 19, 2022, 2:55pm

you redacted your last post, but I was able to see enough of it to recover something I could make a reprex out of, it does give a result for Topic 5.

first_column <- c("2022-05-31T22:03:15.000Z", "2022-05-31T21:18:46.000Z", 
                  "2022-05-31T20:57:38.000Z", "2022-05-31T18:39:54.000Z", "2022-05-31T18:21:03.000Z")
second_column <- c("1.53176E+18", "1.53175E+18", "1.53174E+18", 
                   "1.53171E+18", "1.5317E+18")
third_column <- c("While neighbourhoods in Oxford are made of dead end streetsJust look at a map of BBLIts a massive LTNBins get collected no issue", 
                  "People making short journeys by car are exactly why LTNs are needed in all residential areas", 
                  "This evening I attended the Fox Lane Residents meeting with ward colleagues Many residents voiced their anger over the LTNs and its ramifications in the local communityMore pollutionmore traffic and more misery 12", 
                  "On the pavementon double yellow lines over a cycle LaneFull house for thisHGV", 
                  "Lime tree flowers in bud todayalongside footcycle path at Via Ravenna mid1980sbuilt highwayLooking forward to our ChiTrees project to understand better how we benefit from these highway trees")

df <- data.frame(first_column, second_column, text=third_column)
library(stm)
library(quanteda)
library(tm)
library(tidyverse)
processed <- textProcessor(df$text, metadata = df)
out <- prepDocuments(processed$documents, processed$vocab, processed$meta)
docs <- out$documents
vocab <- out$vocab
meta <- out$meta

tokens <- df$text %>% tokens(what = "word", remove_punct = TRUE, 
                             remove_numbers = TRUE, remove_url = TRUE) %>% tokens_tolower() %>% 
  tokens_remove(stopwords("english"))

dfm <- dfm_trim(dfm(tokens), min_docfreq = 0.001, max_docfreq = 0.99, 
                docfreq_type = "prop", verbose = TRUE)

ldacorpus <- Corpus(VectorSource(tokens))

dfm_stm <- convert(dfm, to = "stm")

model <- stm(documents = dfm_stm$documents, vocab = dfm_stm$vocab, 
             data = meta, K = 8, verbose = TRUE)

Topic5 <- findThoughts(model,df$text, topics = 5, n = 5)
Topic5

ILCC · July 19, 2022, 2:58pm

Yes, sorry I thought I had made an error I was correcting. Not sure why it is pulling something for topic 5 when I input the data like this, but not when I use my full data file

ILCC · July 20, 2022, 2:44pm

Is there an alternative to the findThoughts function that will pull the data I want through?

system · August 10, 2022, 2:45pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.