Filtering within Bigram results

Does anyone know how I can get word count results filtered per document? My current code shows me the total number of occurrences of a bigram but for the entire PDF corpus rather than per document.


files = list.files(pattern = "pdf$")
all=lapply(files, pdf_text)
document= Corpus(VectorSource(all))

document= tm_map(document, content_transformer(tolower))
document= tm_map(document, removeNumbers)
document= tm_map(document, removeWords, stopwords("english"))
document= tm_map(document, removePunctuation)

PDFDataframe= data.frame(text = sapply(document, as.character),
stringsAsFactors = FALSE)

New_bigrams= PDFDataframe%>%
unnest_tokens(bigram, text, token= "ngrams", n= 2)

bigrams_separated= New_bigrams%>%
separate(bigram, c("word1", "word2"), sep= " ")

bigrams_filtered= bigrams_separated %>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word)

bigrams_filtered %>%
filter(word1== "information") %>%
count(word2== "security")

See the FAQ: How to do a minimal reproducible example reprex for beginners. The structure of the PDFDataframe object is not shown, making it hard to provide help.

If it has one variable for document id and another for text, it works similarly to

#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>     filter, lag
#> The following objects are masked from 'package:base':
#>     intersect, setdiff, setequal, union

austen_bigrams <- austen_books() %>%
  unnest_tokens(bigram, text, token = "ngrams", n = 3)

austen_bigrams[which(austen_bigrams$bigram == "ten thousand pounds"),] %>% count(book)
#> # A tibble: 3 × 2
#>   book                    n
#>   <fct>               <int>
#> 1 Sense & Sensibility     2
#> 2 Pride & Prejudice       4
#> 3 Persuasion              1
