Hi everyone, happy new years!
I am currently in the midst of reading literature on determining the number of topics (k) for topic modelling using LDA. Currently the best article i found was this:
I wish to reference this for my thesis, but im not sure if R has a functionality to determine the rate of perplexity change (a heuristic approach to estimate the number of topics). Does anyone know how to implement this in R? This seems highly similar to using eigen values in determining the number of factors for exploratory factor analysis.
Any help appreciated.
EDIT: Sincere apologies all, the topicmodels package has this functionality, however the code takes a really long time to load. REFERENCE Code below.
If anyone else has any ideas to add to this topic (no pun intended!) please feel free to comment.
# Load up R packages including a few we only need later:
library(topicmodels)
library(doParallel)
library(ggplot2)
library(scales)
library(tidyverse)
library(RColorBrewer)
library(wordcloud)
data("AssociatedPress", package="topicmodels")
full_data <- AssociatedPress
system.time({
tunes <- FindTopicsNumber(
full_data,
topics = c(1:10 * 10, 120, 140, 160, 180, 0:3 * 50 + 200),
metrics = c("Griffiths2004", "CaoJuan2009", "Arun2010"),
method = "Gibbs",
control = list(seed = 77),
mc.cores = 4L,
verbose = TRUE
)
})
FindTopicsNumber_plot(tunes)