term document matrix from tm package, selecting columns

Quack · April 8, 2019, 8:15pm

Using the tm package, I got a Term Document Matrix from a corpus of words. After using this code to get a dataframe:

dtm <- TermDocumentMatrix(tdocs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word=names(v), freq=v)

d25<- d[1:25,]

The data frame looks like this:

head(d25)
word freq
get get 1699
just just 1656
good good 1437
like like 1257
know know 1186
day day 1174
names(d25)
[1] "word" "freq"

What about those words in the leftmost side? I want to get a dataframe with only the word and freq columns. How do I get rid of those extra words? I've tried using the d <-select(d, word, freq) in dplyr , but the extra column comes back

head(d25)
word freq
get get 1699
just just 1656
good good 1437
like like 1257
know know 1186
day day 1174

andresrcs · April 8, 2019, 8:41pm

That is not an extra column, those are just row names, you can get rid of them with

rownames(d25) <- NULL

Quack · April 9, 2019, 3:24pm

Yes it worked! Thank you! I didn't know that rows could have names.

andresrcs · April 9, 2019, 3:29pm

If your question's been answered (even by you!), would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it:

system · April 16, 2019, 3:29pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.