I have a problem with ordering my bar graph by descending n. The graph is supposed to display the most frequent words in a corpus of TXT files. I'm not sure if I'm reading the files in incorrectly because others have told me that the code for the plot should be working.
# create minimal dataset:
# create two TXT files
# content of first TXT file: aaa bbb ccc
# content of second TXT file: aaa bbb bbb
# save both files to a folder called TXTs in current working directory
# load packages
library("tidyr")
library("dplyr")
library("purrr")
library("readr")
library("tidytext")
library("ggplot2")
# function to read all files from folder into dataframe
read_folder <- function(infolder) {
tibble(file = dir(infolder, full.names = TRUE)) %>%
mutate(text = map(file, read_lines)) %>%
transmute(id = basename(file), text) %>%
unnest(text)
}
# create corpus from folder with TXT files
raw_text <- read_folder("TXTs")
tidy_text <- raw_text %>%
group_by(id) %>%
unnest_tokens(word, text)
# count most frequent words
# and display in descending order
# ATTEMPT #1
tidy_text %>%
dplyr::count(word, sort = TRUE) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(x = word, y = n)) +
geom_col() +
xlab(NULL) +
coord_flip()
# count most frequent words
# and display in descending order
# ATTEMPT #2
tidy_text %>%
dplyr::count(word, sort = TRUE) %>%
ggplot(aes(x = reorder(factor(word), n), y = n)) +
geom_col() +
xlab(NULL) +
coord_flip()
Neither of these two attempts provide the desired output. The order in the graph should be bbb-aaa-ccc, but it is bbb-ccc-aaa. Thank you!