Hi, I'm beginner in RStudio. I'm getting the following error when I ran Knit on RMarkdown in RStudio Cloud.
Error in as.character(x$content) :
cannot coerce type 'closure' to vector of type 'character'
Calls: ... eval_with_user_handlers -> eval -> eval -> Corpus -> SimpleCorpus
Execution halted
I am running the following code in RStudio Cloud:
Setup and Data Loading
# Install Packages
install.packages("tm") # for text mining
install.packages("SnowballC") # for text stemming
install.packages("wordcloud") # word-cloud generator
install.packages("RColorBrewer") # color palettes
install.packages("syuzhet") # for sentiment analysis
install.packages("ggplot2") # for plotting graphs
# Load Libraries
library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")
library("syuzhet")
library("ggplot2")
# Dataset Import
text <- readLines(file.choose())
# Load Data as Corpus
TextDoc <- Corpus(VectorSource(text))
Data Cleanup
As this is a web-scrapped data, it needs some basic cleanup. I'll remove some punctuation, symbols and English stop words from the data-set. Then I'll convert them all into lowercase and remove extra white space. Finally I will be converting the words to their root form by steaming.
toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))
TextDoc <- tm_map(TextDoc, toSpace, "/") # Replace "/" with Space
TextDoc <- tm_map(TextDoc, toSpace, "@") # Replace "@" with Space
TextDoc <- tm_map(TextDoc, toSpace, "\\|") # Replace "\\|" with Space
TextDoc <- tm_map(TextDoc, toSpace, "-") # Replace "-" with Space
TextDoc <- tm_map(TextDoc, removeNumbers) # Remove Numbers
TextDoc <- tm_map(TextDoc, removePunctuation) # Remove Punctuation
TextDoc <- tm_map(TextDoc, content_transformer(tolower)) # Convert the text to lower case
TextDoc <- tm_map(TextDoc, stripWhitespace) # Remove White space
TextDoc <- tm_map(TextDoc, removeWords, stopwords("english")) # Remove common stop word
TextDoc <- tm_map(TextDoc, stemDocument) # Converting to Root Format
Preliminary Results
Now I'll be viewing the preliminary results to see if there are any un-wanted word counts, which needs to be removing. I'll be using the function TermDocumentMatrix() from the text mining package, it will show a table with frequency of the words. The results will be sorted in descended order and top 400 word frequency will be viewed.
TextDoc_dtm <- TermDocumentMatrix(TextDoc)
dtm_m <- as.matrix(TextDoc_dtm)
dtm_v <- sort(rowSums(dtm_m),decreasing=TRUE)
dtm_d <- data.frame(word = names(dtm_v),freq=dtm_v)
head(dtm_d, 400)
Removing Irrelevant Words
After initial inspection of the result, I'll be removing 61 words from the data-set. Then again load the TermDocumentMatrix() function and count the word frequency.
TextDoc <- tm_map(TextDoc, removeWords, c("data","experi","work", "analyt", "manag", "skill", "abil", "strong", "knowledg", "use", "understand", "includ", "process", "develop", "servic", "client", "technic", "build", "stakehold", "requir", "learn", "effect", "system", "good", "degre", "year", "intern", "analysi", "technolog", "within", "level", "account", "project", "engin", "financ", "will", "abl", "perform", "activ", "relev", "industri", "visualis", "solut", "etc", "creat", "relationship", "present", "solv", "comfort", "coach", "relat", "provid", "previous", "written", "organis", "program", "profici", "queri", "deliv", "time","oper"))
Generating Word Cloud
Now that the result is per my satisfaction, I'll be visualizing the output with a Word Cloud with minimum frequency of 5 and maximum word count of 250 in the Descending Order.
#generate word cloud
set.seed(1234)
wordcloud(words = dtm_d$word, freq = dtm_d$freq, min.freq = 5,
max.words=250, random.order=FALSE, rot.per=0.40,
colors=brewer.pal(8, "Dark2"))
Skills Association
From the previous results, I know which skills are in demand for Data Analyst. Now I need to find out the top skills association with other words, for the following skills, within co:relation of minimum 0.25.
findAssocs(TextDoc_dtm, terms = c("sql","python","alteryx","insight","excel", "azur", "tableau", "powerbi", "model","communic", "team"), corlimit = 0.25)