Hey community!
I am doing a LSTM-analysis for tweets and facing the following issue:
I want to replace the words in a data frame with the numeric value of the word-frequency of every word.
Therefore I used the following code:
#LSTM
#wordcount
prof.tm<-unnest_tokens(twitter, word, text)
word.freq<-prof.tm %>% count(word, sort = TRUE)
word.freq<-cbind(word.freq,"nr"=1:18420)
word.freq2<-word.freq %>%
select(nr, word) %>%
install.packages("dplyr")
library(dplyr)
tweet <- twitter$text
tweettxt <- data.frame(
stringsAsFactors = F,
tweetwords = (strsplit(tweet," ")[[1]])
)
combine the two tables: column n
will contain the frequencies, nr
the ranks
tweetnum <- tweettxt %>%
left_join(word.freq,by=c('tweetwords'='word')) %>%
mutate (n = ifelse(is.na(n),0,n),
nr = ifelse(is.na(nr),Inf,nr))
tweetchar = paste("[",tweetnum$nr,"]",sep='',collapse = ' ')
Do you know how I can use this code for every tweet in the dataset and not only for one tweet?
And how can I create a dataset of the results and not only values?
I hope I could clarify my point and looking forward for every help!