Maintaining tabs when scraping PDF using read_lines()

adambickford · February 24, 2020, 9:26pm

I am scraping a pdf file with the following code:

f.data <- pdf_text(infile) %>% readr::read_lines() %>% str_trim(side="both")

"infile" is the PDF file

There is a lot of lines of text that I have to pass through, but the lines I want are part of a tab-delimited table.
It appears that readr:read_lines converts any tab-stops it encounters to spaces.
Is there a way for readr to maintain the tab stops in specific lines?

TIA
AB

technocrat · February 24, 2020, 11:40pm

Probably something pdftools can do more simply

system · March 16, 2020, 11:40pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.