Filtering within Bigram results

technocrat · January 2, 2022, 7:13am

See the FAQ: How to do a minimal reproducible example reprex for beginners. The structure of the PDFDataframe object is not shown, making it hard to provide help.

If it has one variable for document id and another for text, it works similarly to

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidytext)
library(janeaustenr)

austen_bigrams <- austen_books() %>%
  unnest_tokens(bigram, text, token = "ngrams", n = 3)

austen_bigrams[which(austen_bigrams$bigram == "ten thousand pounds"),] %>% count(book)
#> # A tibble: 3 × 2
#>   book                    n
#>   <fct>               <int>
#> 1 Sense & Sensibility     2
#> 2 Pride & Prejudice       4
#> 3 Persuasion              1