Converting pdfs in a folder to txt

folder <- file.path("C:\Sergio\PDF-folder")
folder
length<-length(dir(folder))
length
dirpdf<-dir(folder)
dirpdf

Downloaded and running

Set path to pdftotxt.exe and convert pdt to txt

pdftotext<- "C:\Users\Sergio\xpdf-tools-win-4.01\xpdf-tools-win-4.01\bin64\pdftotext.exe"

for(i in 1:length(dir(folder)))
{
pdf<-file.path("C:\Sergio\PDF-folder", dirpdf[i])
system(paste(""", pdftotext, "" "", pdf, """, sep=""), wait = TRUE)
next
}

You should look into some R :package: dedicated to pdf extraction.

You can to a lot with this one. There is a pdf_text function.

There is also

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.