I have a folder called "C:/Users/Documents/files_i_want"
which contains several PDF files (all with different names) that I am trying to import into R.
I tried to use the following code to import all the pdf files together:
library(pdftools)
library(tesseract)
#Get the path of filenames
filenames <- list.files("C:/Users/Documents/files_i_want", full.names = TRUE)
#Read them in a list
list_data <- lapply(filenames, pdftools::pdf_convert)
#Name them as per your choice (df_1, df_2 etc)
names(list_data) <- paste('df', seq_along(filenames), sep = '_')
#Create objects in global environment.
list2env(list_data, .GlobalEnv)
But this produced the following errors:
Converting page 1 to 2_sample_1.png...PDF error: No display font for 'ArialUnicode'
done!
Converting page 2 to 2_sample_2.png... done!
Converting page 1 to sample_1_1.png...PDF error: No display font for 'ArialUnicode'
done!
Converting page 2 to sample_1_2.png... done!
When I try to view to view the pdf files that were imported, all I get is this:
df_1
[1] "2_sample_1.png" "2_sample_2.png
Can someone please show me how to fix this?
Thanks
Note: I figured out how to solve this problem by manually importing each file, e.g.
#import and convert 1st file
pngfile_1 <- pdftools::pdf_convert('myfile_1.pdf', dpi = 600)
text_1 <- tesseract::ocr(pngfile_1)
#import and convert 2nd file (note: the files do not have the same naming convention)
pngfile_2 <- pdftools::pdf_convert('second_file.pdf', dpi = 600)
text_2 <- tesseract::ocr(pngfile_2)
etc
But I am trying to find a quicker way to do this.
Thanks