No display font for 'ArialUnicode'”

swaheera · August 1, 2021, 2:40am

I have a folder called "C:/Users/Documents/files_i_want" which contains several PDF files (all with different names) that I am trying to import into R.

I tried to use the following code to import all the pdf files together:

library(pdftools) 
library(tesseract)

#Get the path of filenames

filenames <- list.files("C:/Users/Documents/files_i_want", full.names = TRUE)

#Read them in a list

list_data <- lapply(filenames,  pdftools::pdf_convert)

#Name them as per your choice (df_1, df_2 etc)

names(list_data) <- paste('df', seq_along(filenames), sep = '_')

#Create objects in global environment.

list2env(list_data, .GlobalEnv)

But this produced the following errors:

Converting page 1 to 2_sample_1.png...PDF error: No display font for 'ArialUnicode'
 done!
Converting page 2 to 2_sample_2.png... done!
Converting page 1 to sample_1_1.png...PDF error: No display font for 'ArialUnicode'
 done!
Converting page 2 to sample_1_2.png... done!

When I try to view to view the pdf files that were imported, all I get is this:

 df_1
[1] "2_sample_1.png" "2_sample_2.png

Can someone please show me how to fix this?

Thanks

Note: I figured out how to solve this problem by manually importing each file, e.g.

#import and convert 1st file
   pngfile_1 <- pdftools::pdf_convert('myfile_1.pdf', dpi = 600)
    text_1 <- tesseract::ocr(pngfile_1)

#import and convert 2nd file (note: the files do not have the same naming convention)
   pngfile_2 <- pdftools::pdf_convert('second_file.pdf', dpi = 600)
    text_2 <- tesseract::ocr(pngfile_2)

etc

But I am trying to find a quicker way to do this.

Thanks

system · August 22, 2021, 2:41am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.