See the FAQ: How to do a minimal reproducible example reprex
for beginners. The structure of the PDFDataframe
object is not shown, making it hard to provide help.
If it has one variable for document id and another for text, it works similarly to
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidytext)
library(janeaustenr)
austen_bigrams <- austen_books() %>%
unnest_tokens(bigram, text, token = "ngrams", n = 3)
austen_bigrams[which(austen_bigrams$bigram == "ten thousand pounds"),] %>% count(book)
#> # A tibble: 3 × 2
#> book n
#> <fct> <int>
#> 1 Sense & Sensibility 2
#> 2 Pride & Prejudice 4
#> 3 Persuasion 1