A good way to get help with this sort of thing is to pose an example questions as a reprex
(or as close to a reproducible example as possible). FAQ: Tips for writing R-related questions.
It's not totally clear to me the problem that you're having, but there are a number of R packages that may help. These are all in the aim of getting your pdf and xml data into something you can work with in R.
pdftools
, for extracting text, fonts, attachments and metadata from a PDF file
xml2
is a handy tidyverse package for working with HTML and XML from R