Dear all,
I'm very new at Rstudio but I'm trying to perform textanalyses on a simple excelfile. I created a wordcloud following the instructions on these websites RPubs - Text mining R-cran Descriptions - Part A and Text mining and word cloud fundamentals in R : 5 simple steps you should know - Easy Guides - Wiki - STHDA
I got the wordcloud working and I would like to pick out the top 5 words and try to analyse them so I can see which words associate most frequently. I would like to visualise that with an association network for each of the 5 words. I started with the function findAssocs but then I get every existing word as outcome with a corlimit of 1. I got reading about this topic and I came to the conclusion that, if i'm right, that the problem is that my 1015 lines excelfile get's turned into 1 document in stead of 1015. I allready tried the following solution that was offered to someone else but that doens't seem to work either.
corp <- Corpus(DataframeSource(Kans))
dtm <- DocumentTermMatrix(corp)
dtm
<<DocumentTermMatrix (documents: 1, terms: 4028)>>
Non-/sparse entries: 4028/0
Sparsity : 0%
Error in nchar(Terms(x), type = "chars") :
invalid multibyte string, element 270
I also changed my column headers into doc_id and text as I read in another case but that just turns it into 2 documents, so I just can't seem to get it working. I would very much appreciate it if someone could help me out. My file looks like this, I only included 5 lines that contain the word 'werkbon':
doc_id | text |
---|---|
42 | Monteur kon niet klokken op overige afdeling. Hierdoor heeft hij zelf een werkbon aangemaakt en moeten de uren aangepast worden |
56 | 1) Service technici heeft niet gebeld dat er 1 cilinder op lage druk staat. 2) De werkbon is niet juist controleert. 3) Actie is aangemaakt maar niks mee gedaan |
62 | start/stop klokken e.a. correcties doorgevoerd in tijdregistratie en werkbon |
69 | start/stop klokken verwijderd op werkbon en in de tijdregitratie |
78 | Niet in de werkbon gezet welk paneel vervangen moet worden. Tevens intern gemaild dat paneel teruggestuurd moet worden naar Schrack zodat wij het op garantie kunnen gooien. Dit staat nergens vermeld en de service technicus weet van niks. Na telefonisch overleg met Marinus Jan afgesproken dat hij het paneel meeneemt naar . |
The original code I put in was:
install.packages("tm")
install.packages("SnowballC")
install.packages("wordcloud")
install.packages("RColorBrewer")
install.packages (“corpus”)
install.packages("gdata")
library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")
library("gdata")
library(corpus)
docs <- Corpus(VectorSource(Kans))
docs <- tm_map(docs, content_transformer(tolower))
docs <- tm_map(docs, removePunctuation)
docs <- tm_map(docs, removeWords, stopwords("dutch"))
set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 1, max.words=200, random.order=FALSE, rot.per=0.35, colors=brewer.pal(8, "Dark2"))
dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 10)
set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 1,
max.words=200, random.order=FALSE, rot.per=0.35,
colors=brewer.pal(8, "Dark2"))
findAssocs(dtm, terms = "werkbon", corlimit = 0.3)
Many thanks in advance.
With kind regards,
Diana