Using the tm package, I got a Term Document Matrix from a corpus of words. After using this code to get a dataframe:
dtm <- TermDocumentMatrix(tdocs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word=names(v), freq=v)
d25<- d[1:25,]
The data frame looks like this:
head(d25)
word freq
get get 1699
just just 1656
good good 1437
like like 1257
know know 1186
day day 1174
names(d25)
[1] "word" "freq"
What about those words in the leftmost side? I want to get a dataframe with only the word and freq columns. How do I get rid of those extra words? I've tried using the d <-select(d, word, freq) in dplyr , but the extra column comes back
head(d25)
word freq
get get 1699
just just 1656
good good 1437
like like 1257
know know 1186
day day 1174
If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it: