Dear Learned Community,
I am very new to R and the tidyverse, so I beg your pardon for what may be a very basic question. I am trying to conduct a tf-idf analysis using TidyTools--more specifically, using Text Mining With R. But I am running into a problem early on.
Here's the relevant code from Chapter 3 of "Text Mining":
library(dplyr)
library(janeaustenr)
library(tidytext)
book_words <- austen_books() %>%
unnest_tokens(word, text) %>%
count(book, word, sort = TRUE)
total_words <- book_words %>%
group_by(book) %>%
summarize(total = sum(n))
book_words <- left_join(book_words, total_words)
I adpated it as follows, with 'stoppedwords.Baillie' being one of the the somewhat cleaned up corpuses. I removed the code for 'books,' since the Jane Austen library has all of her separate novels and I have no need at this point to split Baillie into separate plays (and the corpus is not structured with those differences marked, I don't believe).
First step:
From:
book_words <- austen_books() %>%
unnest_tokens(word, text) %>%
count(book, word, sort = TRUE)
To:
tdfBaillie<-stoppedwords.Baillie %>%
count(word, sort = TRUE)
This does return a tibble that looks right--words ranked by frequency
Second Step:
From:total_words <- book_words %>%
group_by(book) %>%
summarize(total = sum(n))
To: total_words <- tdfBaillie %>%
summarize (total = sum(n))
This also LOOKS like it may be right, returning a tibble of one row and summing up as 219508 words. But then I run into trouble
Third Step:
From: book_words <- left_join(book_words, total_words)
To: book_words <- left_join(tdfBaillie, total_words)
This returns the following error: Error: by
must be supplied when x
and y
have no common variables.
i use by = character()` to perform a cross-join.
I'm not sure what's gone wrong here. Before trying to integrate "by=character()," which I don't know how to do in any case, I need to understand why there seem to be no common variables between x (tdfBaillie) and y (total_words), since the latter is built on the former.
Grateful for any help!
Sincerely,
Steve Newman